【ORA-00312】Redo log文件损坏故障解决

 

故障现象:数据库hung住

查看alert日志:

Fri Jul 10 09:22:20 2015

Errors in file /opt/oracle/admin/orclbj/bdump/orclbj_arc1_1264.trc:

ORA-00354: Message 354 not found; No message file for product=RDBMS, facility=ORA

ORA-00353: Message 353 not found; No message file for product=RDBMS, facility=ORA; arguments: [103705] [8411181136] [07/10/2015 01:13:32]

ORA-00312: Message 312 not found; No message file for product=RDBMS, facility=ORA; arguments: [3] [1] [/opt/oracle/oradata/orclbj/redo03.log]

ARC1: All Archive destinations made inactive due to error 354

Fri Jul 10 09:22:20 2015

ARC1: Closing local archive destination LOG_ARCHIVE_DEST_1: ‘/opt/oracle/arch/1_43884_752778831.dbf’ (error 354)

(orclbj)

ARC1: Failed to archive thread 1 sequence 43884 (354)

ARCH: Archival stopped, error occurred. Will continue retrying

Fri Jul 10 09:22:20 2015

ORACLE Instance orclbj – Archival Error

Fri Jul 10 09:22:20 2015

ORA-16038: Message 16038 not found; No message file for product=RDBMS, facility=ORA; arguments: [3] [43884]

ORA-00354: Message 354 not found; No message file for product=RDBMS, facility=ORA

ORA-00312: Message 312 not found; No message file for product=RDBMS, facility=ORA; arguments: [3] [1] [/opt/oracle/oradata/orclbj/redo03.log]

Fri Jul 10 09:22:20 2015

Errors in file /opt/oracle/admin/orclbj/bdump/orclbj_arc1_1264.trc:

ORA-16038: Message 16038 not found; No message file for product=RDBMS, facility=ORA; arguments: [3] [43884]

ORA-00354: Message 354 not found; No message file for product=RDBMS, facility=ORA

ORA-00312: Message 312 not found; No message file for product=RDBMS, facility=ORA; arguments: [3] [1] [/opt/oracle/oradata/orclbj/redo03.log]

Fri Jul 10 09:22:20 2015

ARC0: Archiving not possible: No primary destinations

ARC0: Failed to archive thread 1 sequence 43884 (4)

ARCH: Archival stopped, error occurred. Will continue retrying

Fri Jul 10 09:22:20 2015

ORACLE Instance orclbj – Archival Error

Fri Jul 10 09:22:20 2015

ORA-16014: Message 16014 not found; No message file for product=RDBMS, facility=ORA; arguments: [3] [43884]

ORA-00312: Message 312 not found; No message file for product=RDBMS, facility=ORA; arguments: [3] [1] [/opt/oracle/oradata/orclbj/redo03.log]

Fri Jul 10 09:22:20 2015

Errors in file /opt/oracle/admin/orclbj/bdump/orclbj_arc0_1262.trc:

ORA-16014: Message 16014 not found; No message file for product=RDBMS, facility=ORA; arguments: [3] [43884]

ORA-00312: Message 312 not found; No message file for product=RDBMS, facility=ORA; arguments: [3] [1] [/opt/oracle/oradata/orclbj/redo03.log]

Fri Jul 10 09:23:20 2015

ARC0: Archiving not possible: No primary destinations

ARC0: Failed to archive thread 1 sequence 43884 (4)

Fri Jul 10 09:24:20 2015

ARC1: Archiving not possible: No primary destinations

ARC1: Failed to archive thread 1 sequence 43884 (4)

1、先从 ARC0: Archiving not possible: No primary destinations 这条日志排查起,检查数据库归档路径设置

SQL> show parameter arch;

NAME                                 TYPE        VALUE

———————————— ———– ——————————

archive_lag_target                   integer     0

log_archive_config                   string

log_archive_dest                     string

log_archive_dest_1                   string      LOCATION=/opt/oracle/arch

log_archive_dest_10                  string

log_archive_dest_2                   string

log_archive_dest_3                   string

log_archive_dest_4                   string

log_archive_dest_5                   string

log_archive_dest_6                   string

log_archive_dest_7                   string

2、日志路径设置正常,磁盘空间、读取权限经检查也都正常。

接着继续排查

SQL> select error,dest_id,dest_name,status from v$archive_dest;

ERROR         DEST_ID DEST_NAME            STATUS

———- ———- ——————– ———

ORA-00354:          1 LOG_ARCHIVE_DEST_1   ERROR

corrupt

redo log

block

header

                    2 LOG_ARCHIVE_DEST_2   INACTIVE

3 LOG_ARCHIVE_DEST_3   INACTIVE

4 LOG_ARCHIVE_DEST_4   INACTIVE

5 LOG_ARCHIVE_DEST_5   INACTIVE

6 LOG_ARCHIVE_DEST_6   INACTIVE

ERROR         DEST_ID DEST_NAME            STATUS

———- ———- ——————– ———

7 LOG_ARCHIVE_DEST_7   INACTIVE

8 LOG_ARCHIVE_DEST_8   INACTIVE

9 LOG_ARCHIVE_DEST_9   INACTIVE

10 LOG_ARCHIVE_DEST_10  INACTIVE

10 rows selected.

3、显示归档路径状态错误,并报ORA-00354错误。怀疑可能是redo log数据块损坏导致的状态异常。

再次分析alert日志的其他报错,发现 

都是 ORA-00312: Message 312 not found; No message file for product=RDBMS, facility=ORA; arguments: [3] [1] [/opt/oracle/oradata/orclbj/redo03.log]

这样的条目,随后检查redo log 日志情况。

SQL> select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARC STATUS

———- ———- ———- ———- ———- — ———-

FIRST_CHANGE# FIRST_TIME

————- ———-

1          1      43886  314572800          1 NO  CURRENT

8411417007 10-7? -15

         2          1      43883  314572800          1 YES INACTIVE

8411172346 10-7? -15

         3          1      43884  314572800          1 NO  INACTIVE

8411172351 10-7? -15

4、发现日志序列号为43884与alert日志中报错序列号相符 ARC0: Failed to archive thread 1 sequence 43884 (4)

直接执行删除日志组命令

SQL> alter database drop logfile group 3;

alter database drop logfile group 3

*

ERROR at line 1:

ORA-00350: log 3 of instance orclbj (thread 1) needs to be archived

ORA-00312: online log 3 thread 1: ‘/opt/oracle/oradata/orclbj/redo03.log’

还好日志组3处于INACTIVE状态,我们可以直接将其进行不归档清除

SQL> alter database clear unarchived logfile group 3;

alter database clear unarchived logfile group 3

*

ERROR at line 1:

ORA-27052: unable to flush file data

Linux Error: 5: Input/output error

Additional information: 1

5、再次查询日志状态

SQL> select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARC STATUS

———- ———- ———- ———- ———- — ———-

FIRST_CHANGE# FIRST_TIME

————- ———-

1          1      43886  314572800          1 NO  CURRENT

8411417007 10-7? -15

         2          1      43883  314572800          1 YES INACTIVE

8411172346 10-7? -15

         3          1          0  314572800          1 YES CLEARING

8411172351 10-7? -15

    GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARC STATUS

———- ———- ———- ———- ———- — ———-

FIRST_CHANGE# FIRST_TIME

————- ———-

4          1      43885  314572800          1 NO  INACTIVE

8411315692 10-7? -15

6、日志进入CLEARING状态,不过这次可以直接执行drop日志组了

SQL> ALTER DATABASE DROP LOGFILE GROUP 3;

Database altered.

SQL> select * FROM v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARC STATUS

———- ———- ———- ———- ———- — ———-

FIRST_CHANGE# FIRST_TIME

————- ———-

1          1      43886  314572800          1 NO  CURRENT

8411417007 10-7? -15

         2          1      43883  314572800          1 YES INACTIVE

8411172346 10-7? -15

         4          1      43885  314572800          1 YES INACTIVE

8411315692 10-7? -15

SQL> alter system switch logfile;

System altered.

SQL>  select * FROM v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARC STATUS

———- ———- ———- ———- ———- — ———-

FIRST_CHANGE# FIRST_TIME

————- ———-

1          1      43886  314572800          1 YES ACTIVE

8411417007 10-7? -15

         2          1      43887  314572800          1 NO  CURRENT

8411448686 10-7? -15

         4          1      43885  314572800          1 YES INACTIVE

8411315692 10-7? -15

7、重新添加日志组3

SQL> alter database add logfile group 3’/opt/oracle/oradata/orclbj/redo03.log’ size 300M;

Database altered.

8、注意,稍后请进行一次数据库的全备,至此问题解决。

发表评论