Skip to main content
Pure Technical Services

Setting up Oracle Disaster Recovery using Purity ActiveDR

Currently viewing public documentation. Please login to access the full scope of documentation.

KP_Ext_Announcement.png

 

Purity ActiveDR™, available starting with Purity //FA6.0, delivers continuous, near real-time, replication between two FlashArrays™ within or across disparate data centers thus enabling data protection solutions with near-zero RPO (Recovery Time Objective) and a very short RTO (Recovery Time Objective). This results in global data protection with minimal data loss and fast failover to the Disaster Recovery (DR) site. 

In this article, we will learn how to set up Oracle Disaster recovery (DR) between two sites using Purity ActiveDR™. 

The unit of replication, failover, and consistency with ActiveDR is a pod. Think of a pod as a logical container. Each pod is a separate namespace and can contain a mix of volumes, protection groups with member volumes, and volume snapshot history. A pod, when created, is in the promoted state, meaning it’s available to the host with read/write access. A demoted pod allows only read-only access to the host. A remote pod must be in a demoted state when a replica link is created. Local and remote pods/FlashArrays are also interchangeably referred to as the source and target pods/FlashArrays.

 

Here are the high-level steps that need to be performed to set up DR for an Oracle database:

  1. Create a PROD pod
  2. Create a protection group within the PROD pod
  3. Move volumes out of the current protection group, if applicable
  4. Move volumes into the PROD pod
  5. Add volumes into the protection group in PROD pod
  6. Create DR (remote) Pod
  7. Create Replica Link between the two pods

 

If you are new to ActiveDR, you'll find ActiveDR Solution Overview White Paper quite useful. 

If you are running your Oracle databases in a VMware environment, please refer to Guidelines for ActiveDR in VMware Environments for additional guidance.

 

Environment setup

Our environment consists of a production database oraprd12 that is running on a physical Linux host sn1-r720-e03-03 connected to FlashArray sn1-x70r2-e03-27. This database uses ASM has 6 volumes that are in the protection group oraprd12-pg. We'll refer to this database as the PROD database and the FlashArray as the PROD or local FlashArray.    

We will be setting up a DR site to replicate the production database using ActiveDR. At the DR site, the database is running on physical host sn1-r720-e03-05 connected to FlashArray sn1-x70r2-e03-30. We will perform each step from the Purity GUI, as well as from the CLI. We'll refer to this database as the DR database and the FlashArray as DR or remote FlashArray. 

 

  Local Site (PROD) Remote Site (DR)
Host Name sn1-r720-e03-03 sn1-r720-e03-05
FlashArray sn1-x70r2-e03-27 sn1-x70r2-e03-30
Pod Name oraprd12-pod oraprd12-pod-dr
Protection Group Name oraprd12-pg oraprd12-pg-dr

 

Set up DR site using ActiveDR

1. Create a PROD pod

Log in to the PROD FlashArray GUI, and go to Storage --> Pods. Click on the plus icon at the right top corner of the Pods panel to bring up the following dialog. Enter a pod name and click on the Create button. Once we have completed the setup, the contents of this pod will be replicated to the remote array.

clipboard_ed87d0ecf3b46df5de63facea12feb262.png

 

This can also be performed by executing the following CLI command. 

pureuser@sn1-x70r2-e03-27> purepod create oraprd12-pod
Name            Source  Array             Status  Frozen At  Promotion Status  Link Count
oraprd12-pod    -       sn1-x70r2-e03-27  online  -          promoted          0         

Notice the Promotion Status column. A pod when created is in the promoted status, which means that it is available for both reads as well as writes.

 

2. Create a protection group within the pod

Not only it is common, but it's also a best practice to place volumes containing an Oracle database into Protection Groups. When we need to take a snapshot of the database, we take a snapshot of the protection group instead of individual volumes. That ensures all volumes are write-consistent with each other and they are snapshotted as of the same time.    

We cannot move an existing protection group into a pod. We need to create a new protection group within the pod, and then move existing database volumes into it.

In the GUI, go to Storage --> Pods and click on the pod we created in the previous step. That will open the detail page for the pod. Click on the plus icon at the top right corner of the Protection Groups panel. That will bring up the Create Protection Group dialog shown below. 

clipboard_e9b1aea74c05458e521d96ef5f8188fb9.png

 

Enter the name of the protection group and click Create. A new protection group will be created within the pod as shown below. Notice how a unique namespace is created by qualifying the pod name with the protection group name.

clipboard_e65624be2be843c84648e21bb65371546.png

 

Here is the CLI command to create the protection group within the pod. Notice that we have used the fully qualified name format to specify that the oraprd12-pg protection group should be created inside pod oraprd12-pod.  

pureuser@sn1-x70r2-e03-27> purepgroup create oraprd12-pod::oraprd12-pg
Name                          Source          Targets  Host Groups  Hosts  Volumes
oraprd12-pod::oraprd12-pg     oraprd12-pod    -        -            -      -      

 

3. Move volumes out of the current protection group

After the PROD pod is created, we have to move the FlashArray volumes containing the Oracle database into the PROD pod.

As shown in the following screenshot, the volumes comprising the Oracle database are originally in the protection group oraprd12-pg in the root container i.e. they are not contained in any pod,

clipboard_e92f1101bb3c9105d189122baf2b8983e.png

Not only that protection groups cannot be moved into a pod, volumes that are members of a protection group cannot be moved into a pod.

If we try to move such a volume into a pod, we will get the error - Cannot move protected volume.

Therefore, these volumes first need to be moved out of their protection group. 

In the local FlashArray GUI, go to Protection --> Protection Groups, click on the oraprd12-pg protection group and remove its members, which in this case are volumes.

clipboard_eb236b93fa60778b376647b3879377d1e.png

 

Using the following CLI command, a volume can be removed from a protection group.

pureuser@sn1-x70r2-e03-27> purevol remove  --pgroup oraprd12-pg dg-oraprd12-data

 

4. Move volumes into the PROD pod

Before these volumes can be moved into the protection group oraprd12-pod::oraprd12-pg, they first need to be moved into the pod oraprd12-pod.

clipboard_ea76e99f7a1a806e050ac0432b2057e69.png

 

clipboard_e15848ad556ebd44071bb5525bada5b63.png

 

The purevol move CLI command can be used to move a volume into a pod.

pureuser@sn1-x70r2-e03-27> purevol move dg-oraprd12-data oraprd12-pod

 

 

5. Add volumes to the protection group in PROD pod

Next, go to the protection group inside the pod oraprd12-pod::oraprd12-pg that we created in step 2. Open the menu of the Members panel and select Add Volumes....

clipboard_e16e5e79b4955bd621d7711e59cd875dc.png

 

Add the database volumes to this protection group, as shown below.

clipboard_ee8a9d9e60ae4adf6977471edf4bb3862.png

 

The following CLI command shows how we add a volume to a  protection group. Notice how we have qualified the volume name with the pod name in which they reside. 

pureuser@sn1-x70r2-e03-27>  purevol add  --pgroup oraprd12-pod::oraprd12-pg  oraprd12-pod::dg-oraprd12-data

 

6. Create Remote Pod

Log into the DR or remote FlashArray GUI, and create a "remote" pod, similar to the one we created in step 1. 

The name should be different or you will get the error Local pod name must be different from remote pod name.

Pods are created with the promotion status of promoted which means that they can be read as well as written by the application. Since the remote pod will initially be only receiving the replication stream, it needs to be in a demoted state before the Replica Link can be created to start replication. 

To demote the pod, click on the menu icon (three vertical dots) and select the Demote... from the popup menu

clipboard_eafd05df788a01fdaab15fd30875c4657.png

 

The following CLI commands executed on the remote array creates a pod and then demotes it so that a replica link can be created.

pureuser@sn1-x70r3-e03-30> purepod create oraprd12-pod-dr

pureuser@sn1-x70r3-e03-30> purepod demote oraprd12-pod-dr

 

7. Create Replica Link

We can now create the replica link between the two pods. 

Log in to the local FlashArray GUI, and go to Protection --> ActiveDR. Click on the plus sign on the top right corner of the Pod Replica Links panel to bring up the following dialog.

clipboard_ee17192dc2d35063271a00d4345e92bea.png

 

When the Create button is clicked, ActiveDR automatically begins the initial synchronization process by using Purity's asynchronous snapshot-based replication engine called baselining and as shown by the Status column. 

clipboard_eb1497a955af0c776119c9178357c0dce.png

 

Once baselining is complete, ActiveDR automatically transitions to its normal replicating mode where low RPO continuous replication is used.

The Reovery Point column displays the timestamp of the most recent changes that have been successfully replicated to the remote pod and represents the recovery point in case the remote pod is promoted.

The Lag column shows the amount of time the DR pod is behind PROD pod.  

clipboard_ecf94c884e8d803daa7504a2255b4d64b.png

 

Here's how we create a replica link using the CLI command.

pureuser@sn1-x70r2-e03-27> purepod replica-link create oraprd12-pod-x --remote-pod oraprd12-pod-dr-x --remote sn1-x70r3-e03-30               
Name            Direction  Remote Pod         Remote            Status      Recovery Point  Lag
oraprd12-pod    -->        oraprd12-pod-dr    sn1-x70r3-e03-30  baselining  - 

pureuser@sn1-x70r2-e03-27> purepod replica-link list
Name                   Direction  Remote Pod             Remote            Status       Recovery Point           Lag
oraprd12-pod           -->        oraprd12-pod-dr        sn1-x70r3-e03-30  replicating  2020-10-24 02:08:16 PDT  1s 

 

After a replica link is created, only new volumes can be created inside a pod, existing volumes can no longer be moved into the pod. 

 

 

Failover Preparation and Testing

Unlike ActiveCluster™, with ActiveDR the serial number of a volume in the remote pod will be different from the serial number of the corresponding volume in the local pod. This allows us to manage the remote pod volumes independently and prevents hosts applications and multipathing software from mistakenly treating the source and target volumes as the same volume.

To minimize the RTO in the event of a failover, it is recommended to prepare the remote site beforehand. This will also help with creating scripts to perform automatic failover and failback as many script parameters like host names and volume serial numbers will be known. 

In preparation for failover, we will pre-connect hosts to the write-disabled volumes at the DR site (in the demoted pod). This allows devices and paths to be pre-discovered and pre-created on the DR hosts to make the overall failover process shorter and simpler. Though hosts can be connected to the target pod volumes, these volumes will be read-only while the target pod is in a demoted state.  

The following steps prepare the DR site for failover as well as test the successful failover to the DR database. 

It is assumed that database installation pre-requisite steps like creating OS users and groups, setting kernel parameters to the same values as PROD, Grid Infrastructure installation have already been performed. 

 

1. Pre-attach the volumes to the database host on the remote site.

To minimize the RTO in case of a failure, it is recommended that DR pod volumes be pre-connected to the hosts.  

clipboard_e855bd79e24ffc2f9228178c4f9ad37ff.png

 

2. Promote the remote pod. 

We need to promote the oraprd12-pod-dr pod so that the replication stream stops writing to the pod and to make the volumes available to the host for mounting. The volume content presented to the hosts and their applications will be at the point in time contained by the last successfully completed replication transfer.

Wait for the state to change from promoting to promoted before proceeding.


3. Verify that volumes are visible on the database host(s) 

The DR volumes should also be configured identically to the PROD volumes in the DR host OS. For instance, if using multipath I/O, the device mapper names and aliases should be the same. 

 

3a. Rescan scsi bus

# rescan-scsi-bus.sh

3b. Add multipath aliases in /etc/multipath.conf

If there are a large number of volumes, it's easier to copy the multipath aliases from the PROD host and update the wwids.

3c. Restart the multipath service

# service multipathd restart

3d. Check device ownership 

Make sure all dm- devices corresponding to the volume multipaths are owned by grid:asmadmin. Setup the required udev rules to set the device ownership if not done so already.

The devices should be visible as asm disks since they are already labeled as ASM disks.

3e. Mount disks 

Scan the disks to make sure the latest state of the headers are read. In our case, we are using the ASM Filter Driver (AFD), therefore we use the afd_scan command in asmcmd. If you are using ASMLib for device persistance, use the oracleasm scandisks command.

[grid@sn1-r720-e03-05 ~]$ asmcmd 

ASMCMD> afd_scan

 

Login to asmcmd and run "lsdsk --discovery". It should be able to find the newly added disks. 

ASMCMD>  lsdg --discovery
State       Type    Rebal  Sector  Logical_Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
DISMOUNTED          N           0               0      0        0         0        0                0               0              0             N  ORAPRD12_DATA/
DISMOUNTED          N           0               0      0        0         0        0                0               0              0             N  ORAPRD12_FRA/
DISMOUNTED          N           0               0      0        0         0        0                0               0              0             N  ORAPRD12_REDO/

 

If the State is DISMOUNTED, mount the disk groups.

ASMCMD> mount ORAPRD12_DATA
ASMCMD> mount ORAPRD12_FRA
ASMCMD> mount ORAPRD12_REDO

 

Make sure that the State is MOUNTED before proceeding to the next step.

ASMCMD> lsdg
State    Type    Rebal  Sector  Logical_Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512             512   4096  1048576   8388608  5407628                0         5407628              0             N  ORAPRD12_DATA/
MOUNTED  EXTERN  N         512             512   4096  1048576   2097152  2096433                0         2096433              0             N  ORAPRD12_FRA/
MOUNTED  EXTERN  N         512             512   4096  1048576    204800   204744                0          204744              0             N  ORAPRD12_REDO/

 

4. Mount Oracle software volume

On the DR site, copy the contents of the Database Oracle Home directory from the PROD host to the DR host. Make sure that all paths like $ORACLE_BASE, ORACLE_HOME, etc. are the same.  

 

5. Register the database

Run the following commands on PROD database host to get configuration information for the oraprd12 database.

[grid@sn1-r720-e03-03 ~]$ srvctl config asm
ASM home: <CRS home>
Password file: +DATA/orapwasm
Backup of Password file: 
ASM listener: LISTENER
Spfile: +DATA/ASM/ASMPARAMETERFILE/registry.253.1032470389
ASM diskgroup discovery string: /dev/mapper/*,AFD:*
[oracle@sn1-r720-e03-03 ~]$ srvctl config database -db oraprd12
Database unique name: oraprd12
Database name: oraprd12
Oracle home: /u01/app/oracle/product/19.0.0/dbhome_1
Oracle user: oracle
Spfile: +ORAPRD12_DATA/ORAPRD12/PARAMETERFILE/spfile.265.1042658127
Password file: 
Domain: puretec.purestorage.com
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Disk Groups: ORAPRD12_DATA,ORAPRD12_FRA
Services: 
OSDBA group: 
OSOPER group: 
Database instance: oraprd12

 

As the PROD and DR volumes are copies, we use the spfile name and path from the above listing to register the oraprd12 database on the DR site.

[oracle@sn1-r720-e03-05 ~]$ srvctl add database -db oraprd12 -oraclehome /u01/app/oracle/product/19.0.0/dbhome_1 \
                                                -spfile +ORAPRD12_DATA/ORAPRD12/PARAMETERFILE/spfile.265.1042658127

 

Create audit directories for oraprd12 if they do not exist.

[oracle@sn1-r720-e03-05 ~]$ mkdir $ORACLE_BASE/audit/oraprd12
[oracle@sn1-r720-e03-05 ~]$ mkdir $ORACLE_BASE/admin/oraprd12/adump

 

6. Start the database 

It should come up without any problems.

[oracle@sn1-r720-e03-05 oraprd12]$ srvctl start database -db oraprd12

 

7. Revert DR to normal mode

After completion of the database/application failover testing, we need to get the DR site back to normal mode, i.e. the state where the pod is in the demoted state and continuously applying the replication stream from PROD. 

Demoting the DR pod will cause any test data written into that pod to be discarded from the pod. When a pod is demoted, Purity creates an undo pod. This undo pod is a special object that, if cloned, can be accessed as a new pod to gain access to that data.

clipboard_ed30281af8e363f34c0556b4dd62bc858.png

 

For demoting the DR pod, we need to do the following:

1. Stop the database

2. Unmount the filesystems or ASM disk groups

3. Demote the oraprd12-pod-dr pod

 

Real Failover to DR site

A failure at the production site might necessitate failover to the DR site. The nature of the failure can range from the entire site going down to just a few components or systems going down. The call to failover to the DR site is a technical as well as a business decision. The Oracle database may or may not be up when the call to failover is taken.

ActiveDR does not automatically or transparently failover to the DR FlashArray upon loss or unavailability of the PROD FlashArray. The ActiveDR failover process for applications and hosts must be triggered by the administrator. In the previous sections, we saw the command-line equivalent of each of the GUI actions was performed. It is quite straightforward to create scripts that would automate the failover and failback process.

Once a disaster is determined or declared, and a decision to failover has been made, the administrator would typically perform the following steps to failover an Oracle database and application:

  1. Stop all applications on PROD site, if they are still up 
  2. Stop all Oracle databases on the PROD site
  3. Stop all ASM instances, that would unmount all ASM disk groups.
  4. Promote the DR pod. This will make the DR pod writable by the databases and applications
  5. If the FlashArray is not impacted by the failure and is accessible, demote the PROD pod, preferably using the quiesce option
  6. If required, make the necessary changes to the network load balances/DNS server to redirect connections to the DR site
  7. Start ASM instances in the DR site, verify all disk groups are mounted
  8. Start Oracle databases on the DR site
  9. Start applications on the DR site.

As we can see, none of the steps listed above can be called "configuration" steps. That's because all configuration steps were one-time in nature, and were already performed in the failover preparation phase. 

One of the nice features of ActiveDR is that the API/CLI commands to do a test failover is the same as doing a real failover. The main difference is that in a test failover, the PROD pod is also in promoted status, and continues to send the replication stream to DR site. This stream is not applied to the promoted DR pod and is staged on the DR FlashArray for later application. 

 

Planned Failover to DR site

Often there is a need to perform planned failovers or migrations between sites with a short interruption for cutover. The planned failover process assumes that all applications can be gracefully shutdown before ActiveDR failover is initiated again on the other site. The process for doing that would be:

  1. Gracefully stop the applications at the PROD site
  2. Stop Oracle databases on the PROD site
  3. Stop ASM instances on the PROD site
  4. Demote the PROD pod using the quiesce option
  5. Promote the DR pod
  6. If required, make the necessary changes to the network load balances/DNS server to redirect connections to the DR site
  7. Start ASM instances, rescan disks, make sure disk groups are mounted
  8. Start Oracle databases on DR site
  9. Start application processes

 

Note that we have used the quiesce option while demoting the PROD pod. The quiesce option tells ActiveDR to put the replica link into an idle state after all content has been sent to the target.

 

Reversing Replication for Re-Protection

After a real failover, the DR site becomes the new production site. When the original PROD site returns to service, the administrators may decide to continue to run the production applications on the DR site, designating it as the new PROD site. The roles will be reversed and the original PROD site will now become the DR site. What remains to be done is to reverse the replication direction and re-protect the new PROD site. 

  1. Stop the applications and databases on the original PROD, if not done already
  2. Demote the original PROD pod using the skip quiesce option, if not done already

Whenever we demote a pod that is the source of a replica link, if the other pod is promoted, ActiveDR will automatically reverse replication, ensuring that the new application data at the other site is protected as quickly as possible. This eliminates risks associated with manual processes and automates the reversal of replication relationships. Automatic replication reversal does not happen during DR tests because the DR test pod is the target, not the source, of the replica link.

Demoting the original source pod will also save a temporary copy of the source pod content (prior to replication reversal) in the recycle bin for 24 hours. If needed, you can clone the .undo pod so that you can recover any data written but not replicated before the outage.

 

Failback to PROD site

After a real failover, the DR site becomes the new production site. Administrators would reverse the replication direction and re-protect the DR site as described above. However, at some later point, they may decide to failback to the original state. A short downtime will need to be scheduled. 

To accomplish this, we follow the following process:

  1. Gracefully stop the production applications at the original DR site
  2. Stop Oracle databases on the original DR site
  3. Stop ASM instances on the original DR site
  4. Demote the oraprd12-pod-dr pod using the quiesce option
  5. Promote the oraprd12-pod pod
  6. If required, make the necessary changes to the network load balances/DNS server to redirect connections to the PROD site
  7. Start ASM instances, rescan disks, make sure disk groups are mounted
  8. Start Oracle databases on PROD site
  9. Start application processes