Oracle Standby Redo Log实验两则

Standby Redo Log是Oracle Dataguard的重要组件内容。在笔者看来,Standby Redo Log就是Physical Standby进行数据同步的online redo log。Standby端要想进行同步数据,就必须存在一组或者多组的Standby Redo Log。

根据不同的保护模式(Protection Mode),主库Primary和备库Standby维持一种同步关系。这主要体现在一旦网络连接中断或者应用动作Apply中断,主库的事务形式上。那么,在默认保护模式情况下,如果主库不断的将新的redo log发送给Standby端,standby redo log写满或者切换满之后,Oracle的行为是什么样?下面通过实验来进行验证。

1、环境说明

笔者使用Oracle 11gR2进行测试,具体版本编号是11.2.0.4。当前Primary和Standby端已经搭建完成,Redo Apply动作正常。

主库Primary情况如下:

SQL> select open_mode, database_role from v$database;

OPEN_MODE            DATABASE_ROLE

-------------------- ----------------

READ WRITE          PRIMARY

SQL> select group#, sequence#, archived, status from v$log;

    GROUP#  SEQUENCE# ARCHIVED STATUS

---------- ---------- -------- ----------------

        1        37 NO      CURRENT

        2        35 YES      INACTIVE

        3        36 YES      INACTIVE

SQL> select recid, sequence#, ARCHIVED, APPLIED, DELETED from v$archived_log where name='vlifesb';

    RECID  SEQUENCE# ARCHIVED APPLIED  DELETED

---------- ---------- -------- --------- -------

(篇幅原因,有省略……)

        20        31 YES      YES      NO

        22        32 YES      YES      NO

        24        33 YES      YES      NO

        26        34 YES      YES      NO

        28        35 YES      YES      NO

        30        36 YES      NO        NO

15 rows selected

Standby端情况如下:

SQL> select open_mode, database_role from v$database;

OPEN_MODE            DATABASE_ROLE

-------------------- ----------------

READ ONLY WITH APPLY PHYSICAL STANDBY

SQL> select group#, dbid, sequence#, used, archived, status from v$standby_log;

    GROUP# DBID                  SEQUENCE#      USED ARCHIVED STATUS

---------- -------------------- ---------- ---------- -------- ----------

        4 4207470439                  37    6491648 YES      ACTIVE

        5 UNASSIGNED                    0          0 NO      UNASSIGNED

        6 UNASSIGNED                    0          0 YES      UNASSIGNED

SQL> select recid, sequence#, ARCHIVED, APPLIED, DELETED from v$archived_log where name is not null;

    RECID  SEQUENCE# ARCHIVED APPLIED  DELETED

---------- ---------- -------- --------- -------

        11        32 YES      YES      NO

        12        33 YES      YES      NO

        13        34 YES      YES      NO

        14        35 YES      YES      NO

        15        36 YES      IN-MEMORY NO

当前两者同步开启状态,Standby Redo Log当前对应编号是37,与Primary端的Current Redo Log相匹配。

2、中断监听传输测试

“数据库宕机”是我们经常说到的数据库故障名词。但是宕机会有不同的故障点和故障方式。如果在Redo Apply的过程中,监听器发生故障终止服务,系统是什么方式和现象。

查看Standby端监听器情况,关闭监听器。

[oracle@vLIFE-URE-OT-DB-STANDBY ~]$ lsnrctl status

LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 19-OCT-2015 11:07:58

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=1521)))

STATUS of the LISTENER

------------------------

(篇幅原因,有省略……)

Service "vlifesb" has 2 instance(s).

  Instance "vlifesb", status UNKNOWN, has 1 handler(s) for this service...

  Instance "vlifesb", status READY, has 1 handler(s) for this service...

The command completed successfully

[oracle@vLIFE-URE-OT-DB-STANDBY ~]$ lsnrctl stop

LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 19-OCT-2015 11:08:04

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=1521)))

The command completed successfully

当终止Standby端监听程序的时候,主库立即在alert log中有对应反映。

******************************************

Fatal NI connect error 12541, connecting to:

 (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=172.16.19.90)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=vlifesb)(CID=(PROGRAM=oracle)(HOST=vLIFE-URE-OT-DB-PRIMARY)(USER=oracle))))

  VERSION INFORMATION:

        TNS for Linux: Version 11.2.0.4.0 - Production

        TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production

  Time: 19-OCT-2015 11:09:05

  Tracing not turned on.

  Tns error struct:

    ns main err code: 12541

   

TNS-12541: TNS:no listener

    ns secondary err code: 12560

    nt main err code: 511

   

TNS-00511: No listener

    nt secondary err code: 111

    nt OS err code: 0

Error 12541 received logging on to the standby

Check whether the listener is up and running.

PING[ARC2]: Heartbeat failed to connect to standby 'vlifesb'. Error is 12541.

主库端查看传输通道情况。

SQL> select * from v$archive_dest_status;

  DEST_ID DEST_NAME            STATUS    TYPE          DATABASE_MODE  RECOVERY_MODE          PROTECTION_MODE      DESTINATION                                                                      STANDBY_LOGFILE_COUNT STANDBY_LOGFILE_ACTIVE ARCHIVED_THREAD# ARCHIVED_SEQ# APPLIED_THREAD# APPLIED_SEQ# ERROR                                                                            SRL DB_UNIQUE_NAME                SYNCHRONIZATION_STATUS SYNCHRONIZED GAP_STATUS

---------- -------------------- --------- -------------- --------------- ----------------------- -------------------- -------------------------------------------------------------------------------- --------------------- ---------------------- ---------------- ------------- --------------- ------------ -------------------------------------------------------------------------------- --- ------------------------------ ---------------------- ------------ ------------------------

        1 LOG_ARCHIVE_DEST_1  VALID    LOCAL          OPEN            IDLE                    MAXIMUM PERFORMANCE  /u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch                                                    0                      0                1            36              0            0                                                                                  NO  NONE                          CHECK CONFIGURATION    NO           

        2 LOG_ARCHIVE_DEST_2  ERROR    PHYSICAL      OPEN_READ-ONLY  MANAGED REAL TIME APPLY MAXIMUM PERFORMANCE  vlifesb                                                                                              3                      0                1            36              1          35 ORA-12541: TNS: ???à?????ò                                                      YES vlifesb                        CHECK CONFIGURATION    NO          RESOLVABLE GAP

切换一下日志。

SQL> alter system switch logfile;

System altered

SQL> select recid, sequence#, ARCHIVED, APPLIED, DELETED from v$archived_log where name='vlifesb';

    RECID  SEQUENCE# ARCHIVED APPLIED  DELETED

---------- ---------- -------- --------- -------

(篇幅原因,有省略……)

        24        33 YES      YES      NO

        26        34 YES      YES      NO

        28        35 YES      YES      NO

        30        36 YES      NO        NO

        32        37 YES      NO        NO

16 rows selected

新日志没有能够apply,v$log中信息。

SQL> select group#, sequence#, archived, status from v$log;

    GROUP#  SEQUENCE# ARCHIVED STATUS

---------- ---------- -------- ----------------

        1        37 YES      ACTIVE

        2        38 NO      CURRENT

        3        36 YES      INACTIVE

强行手工checkpoint操作。

SQL> alter system checkpoint;

System altered

SQL> select group#, sequence#, archived, status from v$log;

    GROUP#  SEQUENCE# ARCHIVED STATUS

---------- ---------- -------- ----------------

        1        37 YES      INACTIVE

        2        38 NO      CURRENT

        3        36 YES      INACTIVE

恢复连接之后,可以发现传输和应用持续过程。

[oracle@vLIFE-URE-OT-DB-STANDBY ~]$ lsnrctl start

LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 19-OCT-2015 11:14:41

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

Starting /u01/app/oracle/product/11.2.0/dbhome_1/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 11.2.0.4.0 - Production

System parameter file is /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora

Log messages written to /u01/app/oracle/diag/tnslsnr/vLIFE-URE-OT-DB-STANDBY/listener/alert/log.xml

Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521)))

Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=1521)))

STATUS of the LISTENER

------------------------

Alias                    LISTENER

Version                  TNSLSNR for Linux: Version 11.2.0.4.0 - Production

Start Date                19-OCT-2015 11:14:41

Uptime                    0 days 0 hr. 0 min. 0 sec

Trace Level              off

Security                  ON: Local OS Authentication

SNMP                      OFF

Listener Parameter File  /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora

Listener Log File        /u01/app/oracle/diag/tnslsnr/vLIFE-URE-OT-DB-STANDBY/listener/alert/log.xml

Listening Endpoints Summary...

  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521)))

  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))

Services Summary...

Service "vlifesb" has 1 instance(s).

  Instance "vlifesb", status UNKNOWN, has 1 handler(s) for this service...

The command completed successfully

在standby端,可以查看到持续后追的applied动作。

--Standby端

SQL> select recid, sequence#, ARCHIVED, APPLIED, DELETED from v$archived_log where name is not null;

    RECID  SEQUENCE# ARCHIVED APPLIED  DELETED

---------- ---------- -------- --------- -------

        11        32 YES      YES      NO

        12        33 YES      YES      NO

        13        34 YES      YES      NO

        14        35 YES      YES      NO

        15        36 YES      YES      NO

        16        37 YES      YES      NO

        17        38 YES      IN-MEMORY NO

7 rows selected

SQL> select recid, sequence#, ARCHIVED, APPLIED, DELETED from v$archived_log where name='vlifesb';

    RECID  SEQUENCE# ARCHIVED APPLIED  DELETED

---------- ---------- -------- --------- -------

(篇幅原因,有省略……)

        28        35 YES      YES      NO

        30        36 YES      YES      NO

        32        37 YES      YES      NO

        34        38 YES      NO        NO

17 rows selected

3、终止apply过程实验

如果standby端终止apply过程,后续的redo log不断传入到standby redo log中,看现象如何。

Standby端处理,终止应用日志过程。

--Standby端

SQL> select open_mode, database_role from v$database;

OPEN_MODE            DATABASE_ROLE

-------------------- ----------------

READ ONLY WITH APPLY PHYSICAL STANDBY

--终止应用日志

SQL> alter database recover managed standby database cancel;

Database altered

SQL> select open_mode, database_role from v$database;

OPEN_MODE            DATABASE_ROLE

-------------------- ----------------

READ ONLY            PHYSICAL STANDBY

此时,standby端日志上显示信息。

Mon Oct 19 11:18:53 2015

alter database recover managed standby database cancel

Mon Oct 19 11:18:53 2015

MRP0: Background Media Recovery cancelled with status 16037

Errors in file /u01/app/oracle/diag/rdbms/vlifesb/vlifesb/trace/vlifesb_pr00_9008.trc:

ORA-16037: user requested cancel of managed recovery operation

Managed Standby Recovery not using Real Time Apply

Recovery interrupted!

Recovered data files to a consistent state at change 1398760

Mon Oct 19 11:18:53 2015

MRP0: Background Media Recovery process shutdown (vlifesb)

Managed Standby Recovery Canceled (vlifesb)

Completed: alter database recover managed standby database cancel

此时,主库情况也是进行到39号redo log。

SQL> select recid, sequence#, ARCHIVED, APPLIED, DELETED from v$archived_log where name='vlifesb';

    RECID  SEQUENCE# ARCHIVED APPLIED  DELETED

---------- ---------- -------- --------- -------

(篇幅原因,有省略……)

        22        32 YES      YES      NO

        24        33 YES      YES      NO

        26        34 YES      YES      NO

        28        35 YES      YES      NO

        30        36 YES      YES      NO

        32        37 YES      YES      NO

        34        38 YES      YES      NO

17 rows selected

SQL> select group#, sequence#, archived, status from v$log;

    GROUP#  SEQUENCE# ARCHIVED STATUS

---------- ---------- -------- ----------------

        1        37 YES      INACTIVE

        2        38 YES      INACTIVE

        3        39 NO      CURRENT

连续切换主库日志。

SQL> alter system switch logfile;

System altered

SQL> alter system switch logfile;

System altered

SQL> alter system switch logfile;

System altered

主库情况:

SQL> select group#, sequence#, archived, status from v$log;

    GROUP#  SEQUENCE# ARCHIVED STATUS

---------- ---------- -------- ----------------

        1        40 YES      INACTIVE

        2        41 YES      INACTIVE

        3        42 NO      CURRENT

SQL> select recid, sequence#, ARCHIVED, APPLIED, DELETED from v$archived_log where name='vlifesb';

    RECID  SEQUENCE# ARCHIVED APPLIED  DELETED

---------- ---------- -------- --------- -------

(篇幅原因,有省略…..)

        30        36 YES      YES      NO

        32        37 YES      YES      NO

        34        38 YES      YES      NO

        36        39 YES      NO        NO

        38        40 YES      NO        NO

        40        41 YES      NO        NO

20 rows selected

当前日志切换到42号,由于网络传输是通畅的,所以三个日志是被成功的传输到Standby端,但是没有被应用。

这个时候,我们需要观察standby端的standby redo log情况。

(standby情况)

SQL> select group#, dbid, sequence#, used, archived, status from v$standby_log;

    GROUP# DBID                  SEQUENCE#      USED ARCHIVED STATUS

---------- -------------------- ---------- ---------- -------- ----------

        4 4207470439                  42      17920 YES      ACTIVE

        5 UNASSIGNED                    0          0 NO      UNASSIGNED

        6 UNASSIGNED                    0          0 YES      UNASSIGNED

SQL> select recid, sequence#, ARCHIVED, APPLIED, DELETED from v$archived_log;

    RECID  SEQUENCE# ARCHIVED APPLIED  DELETED

---------- ---------- -------- --------- -------

(篇幅原因,有省略……)

        16        37 YES      YES      NO

        17        38 YES      YES      NO

        18        39 YES      NO        NO

        19        40 YES      NO        NO

        20        41 YES      NO        NO

20 rows selected

注意:当apply动作没有进行,但是日志不断传输的时候,standby redo log中只是保存最新的当前log,与Primary相匹配。过期的日志是会作为归档保存在归档日志列表中。

此时alert log中的信息如下:

Mon Oct 19 11:21:57 2015

Archived Log entry 18 added for thread 1 sequence 39 ID 0xfac9d167 dest 1:

Mon Oct 19 11:21:57 2015

Primary database is in MAXIMUM PERFORMANCE mode

RFS[13]: Assigned to RFS process 15589

RFS[13]: Selected log 4 for thread 1 sequence 40 dbid -87496857 branch 892734889

Mon Oct 19 11:21:58 2015

Archived Log entry 19 added for thread 1 sequence 40 ID 0xfac9d167 dest 1:

Mon Oct 19 11:21:58 2015

Primary database is in MAXIMUM PERFORMANCE mode

RFS[14]: Assigned to RFS process 15591

RFS[14]: Selected log 4 for thread 1 sequence 41 dbid -87496857 branch 892734889

Mon Oct 19 11:22:02 2015

Archived Log entry 20 added for thread 1 sequence 41 ID 0xfac9d167 dest 1:

Mon Oct 19 11:22:02 2015

Primary database is in MAXIMUM PERFORMANCE mode

RFS[15]: Assigned to RFS process 15593

RFS[15]: Selected log 4 for thread 1 sequence 42 dbid -87496857 branch 892734889

注意:这个日志告诉我们,在standby端,是依次的找可用的standby redo log来使用。如果找到可用的standby redo log,就直接使用好了。

顺便讨论一下,那么什么时候会找不到合适的standby redo log用呢?笔者遇到过文件不存在,另外如果出现standby端arch进程来不及将日志写入归档,应该也会写入到另一组的standby redo log中。

下面如果启动程序,进行更新。

SQL> alter database recover managed standby database using current logfile disconnect from session;

Database altered

在日志中看到应用日志过程:

Mon Oct 19 11:23:53 2015

alter database recover managed standby database using current logfile disconnect from session

Attempt to start background Managed Standby Recovery process (vlifesb)

Mon Oct 19 11:23:53 2015

MRP0 started with pid=28, OS id=15602 

MRP0: Background Managed Standby Recovery process started (vlifesb)

 started logmerger process

Mon Oct 19 11:23:58 2015

Managed Standby Recovery starting Real Time Apply

Parallel Media Recovery started with 4 slaves

Waiting for all non-current ORLs to be archived...

All non-current ORLs have been archived.

Media Recovery Log /u01/app/oracle/fast_recovery_area/VLIFESB/archivelog/2015_10_19/o1_mf_1_39_c28rgoqb_.arc

Media Recovery Log /u01/app/oracle/fast_recovery_area/VLIFESB/archivelog/2015_10_19/o1_mf_1_40_c28rgpq1_.arc

Media Recovery Log /u01/app/oracle/fast_recovery_area/VLIFESB/archivelog/2015_10_19/o1_mf_1_41_c28rgtl2_.arc

Media Recovery Waiting for thread 1 sequence 42 (in transit)

Recovery of Online Redo Log: Thread 1 Group 4 Seq 42 Reading mem 0

  Mem# 0: /u01/app/oracle/oradata/VLIFESB/onlinelog/o1_mf_4_c265gc9q_.log

  Mem# 1: /u01/app/oracle/fast_recovery_area/VLIFESB/onlinelog/o1_mf_4_c265gcfk_.log

Completed: alter database recover managed standby database using current logfile disconnect from session

这个过程中,进行standby recovery操作,先从归档日志中找到没有apply的进行应用。之后应用standby redo log。

此时,apply成功,能够追上Primary。

SQL> select group#, dbid, sequence#, used, archived, status from v$standby_log;

    GROUP# DBID                  SEQUENCE#      USED ARCHIVED STATUS

---------- -------------------- ---------- ---------- -------- ----------

        4 4207470439                  42    107520 YES      ACTIVE

        5 UNASSIGNED                    0          0 NO      UNASSIGNED

        6 UNASSIGNED                    0          0 YES      UNASSIGNED

SQL> select recid, sequence#, ARCHIVED, APPLIED, DELETED from v$archived_log;

    RECID  SEQUENCE# ARCHIVED APPLIED  DELETED

---------- ---------- -------- --------- -------

(篇幅原因,有省略……)

        15        36 YES      YES      NO

        16        37 YES      YES      NO

        17        38 YES      YES      NO

        18        39 YES      YES      NO

        19        40 YES      YES      NO

        20        41 YES      IN-MEMORY NO

20 rows selected

4、结论

Standby Redo Log是Standby数据库的online redo log。对于Oracle而言,online redo log和standby redo log都反映了当前最近的一个日志对象。针对数据库角色的不同,应用的操作各走两支。

相关推荐