Skip to main content
Pure Technical Services

SafeMode with Site Recovery Manager User Guide

Currently viewing public documentation. Please login to access the full scope of documentation.

KP_Ext_Announcement.png
For FlashArrays with SafeMode enabled additional planning and considerations are required for the best experience with Site Recovery Manager. SRM will manage FlashArray storage with SRA for VMFS and VASA for vVols. SRA and VASA will create new volumes, destroy volumes, eradicate volumes, place volumes in FlashArray protection groups, remove volumes from FlashArray protection groups and disable snapshot/replication schedules. How does enabling FlashArray SafeMode impact SRM with SRA or vVols?

Site Recovery Manager with SafeMode and SRA - SRA 4.2.0 and Above

In SRA 4.2.0 and above, there are no known issues that require workarounds when SafeMode is enabled on the FlashArray(s) connected to SRM. 

The changes that were made to the SRA to accomplish this are as follows:

  1. When a test failover is run, the SRA looks for a volume called <volume-name>-testFailover in the state deleted; if this volume is detected, the SRA recovers it then overwrites it.
  2. With a failover, the SRA creates a snapshot with the name <volume name>-demoted.PrepareReverse-<count> instead of <volume name>-demoted.PrepareReverse. The <count> increments each time a reverse replication is run.

 

Site Recovery Manager with SafeMode and SRA - SRA 4.1.0 and Lower

There are two main problem points when using SRM with FlashArray's that have SafeMode enabled with SRA version 4.1.0 and lower:  the test failover workflow and the re-protect workflow.  Here we can cover how to workaround these issues when SafeMode is enabled when using SRM.

Test Failover Workflow with SafeMode

Running a test failover on the recovery plan will succeed without any issue.  The problems that SRA starts to see is when the cleanup is issued as SRA is unable to eradicate objects.  These are the steps to be followed when running a test failover cleanup when SafeMode is enabled.

  1. After test failover is complete and the test is ready to be cleaned up, run the test failover cleanup once (it will fail)
  2. On the target FlashArray recovery the "puresra-testfailover" volume(s)
  3. On the target FlashArray rename and destroy the "puresra-testfailover" volume(s)
  4. In SRM re-run the test failover cleanup and do not check the force cleanup box

The second cleanup will succeed without any issue.  The main reason that we want to rename the test failover volumes is to make sure that if another test failover is ran afterwards that it will succeed.  If the "puresra-testfailover" volumes are pending eradication, then SRA is unable to recover volumes with the name it wants and it will fail.

Now that the steps are outlined, here are the workflows.  

After the test failover has completed, the SRA will recover the test volumes on the target array and name them with a suffix of "puresra-testFailover". 

SafeMode-SRA-ScreenShot-01.png
SafeMode-SRA-ScreenShot-02.png

When running the test failover cleanup, the first workflow will fail.  This is because SRA is unable to eradicate the test failover volumes.  On the target FlashArray recover the "puresra-testFailover" volume(s), rename them and then destroy them.

SafeMode-SRA-ScreenShot-03.png
The first cleanup fails
SafeMode-SRA-ScreenShot-04.png
Recover the destroyed puresra-testFailover volume
SafeMode-SRA-ScreenShot-05.png
Rename the puresra-testFailover volume
SafeMode-SRA-ScreenShot-06.png
SafeMode-SRA-ScreenShot-07.png
Destroy the renamed volume

After the volumes have been renamed and destroyed, run the test failover cleanup workflow again.  The force cleanup box does not need to be checked.

SafeMode-SRA-ScreenShot-08.png
Run the cleanup again and there is no need to check the Force cleanup box
SafeMode-SRA-ScreenShot-09.png

Overall, the test failover workflow is only impacted on the cleanup process and while you could do a force cleanup, the best way to make sure that future test failover workflows won't fail would be to rename and destroy the test failover volume(s).


Failover Workflow with SafeMode

There will not be any issues with the failover/recovery workflow.  All of the steps followed here with SRA are not impacted by SafeMode being enabled.  So the RTO with SafeMode will not be impacted.


Re-Protect Workflow with SafeMode

This is where things get more complicated.  There are many steps needed to both clean up the source and the recovery site to complete the re-protect workflow.

  1. Rename and Destroy the demoted volume on source FlashArray
  2. Destroy the "demoted" protection group on the source FlashArray
  3. Run the Re-protect twice as the first will fail but the second will succeed
  4. Recover the destroyed "demoted" protection group on the source FlashArray
  5. Rename and destroy the "PureSRADefaultProtectionGroup" on the target FlashArray
  6. Place the recovered volume in the FlashArray protection group that will protect the volume
  7. Replicate the protection group snapshot as a sync once as a "manual" re-protect
  8. From SRM run a device discovery for the array pair configured in the SRA

As we can see, there are quite a few more manual steps that are required with SafeMode.  The main driving reason that this manual work must be done is that the SRA is unable to remove volumes from protection groups and is unable to disabled replication schedules for protection groups with SafeMode enabled.  Now that we have the steps, here is a direct look at the workflow.

Here the recovery was completed successfully.

SafeMode-SRA-ScreenShot-10.png

On the target FlashArray, we can see the recovered volume has a "puresra-failover" suffix and on the source FlashArray we can see the volume has a "puresra-demoted" suffix on it.  We can also see that the source protection group still has the replication schedule enabled.  If we try to remove the demoted volume from the pgroup the request will fail.

SafeMode-SRA-ScreenShot-11.png
The puresra-failover volume created on the recovery array
SafeMode-SRA-ScreenShot-12.png
The protection group on the source array with the demoted volume.  Notice that replication is still enabled
SafeMode-SRA-ScreenShot-13.png
You can not remove the volume from the protection group when SafeMode is enabled

On the source FlashArray, rename and destroy each of the "-puresra-demoted" volumes that were recovered from this SRM recovery plan.

SafeMode-SRA-ScreenShot-17.png
Rename the demoted volume
SafeMode-SRA-ScreenShot-18.png
Destroy the renamed demoted volume

Then, destroy the FlashArray protection groups that were failed over in the SRM recovery plan.  We will be recovering these pgroups after the re-protect has completed.

SafeMode-SRA-ScreenShot-19.png
Destroy the demoted protection group on the source array

After navigating back to SRM, you can now run the re-protect again.  The first attempt will fail, but the next one should succeed.  If it happens to fail, make sure that all the demoted volumes were renamed and destroyed as well as the protection groups being destroyed.  A device discovery might be required to get this updated as well.  When looking at the details of the successful re-protect you will see that it completed with warnings.  This is because SRA was unable to find the demoted volume and pgroup.

SafeMode-SRA-ScreenShot-20.png
Run the re-protect workflow a couple of times.  The first one will fail as device discovery information is likely stale
SafeMode-SRA-ScreenShot-21.png
With a successful re-protect you will notice warnings that the demoted volume/s were not found, this is expected

The demoted protection group can be recovered on the source FlashArray.  Once recovered you will see the schedule is still enabled, but there are no members found as the demoted volume was destroyed.

SafeMode-SRA-ScreenShot-33.png
On the source array, recover the destroyed protection group
SafeMode-SRA-ScreenShot-34.png
The renamed demoted volume is destroyed and not part of the recovered protection group now

The next steps that need to be done to fully complete the re-protect workflow is making sure that the recovered volumes are in the right FlashArray protection group.

By default the SRA will first create a temporary default pgroup to place the volumes in during the re-protect process.  With SafeMode enabled, SRA is unable to progress in the workflow far enough to move the volumes to a different pgroup that meets the requirements or destroy the temp pgroup.  As such, the temp pgroup should be renamed and destroyed.  

SafeMode-SRA-ScreenShot-22.png
The PureSRADefaultProtectionGroup on the recovery array
SafeMode-SRA-ScreenShot-23.png
Notice here that both the snapshot and replication schedules are enabled, even though only replication was enabled on the source
SafeMode-SRA-ScreenShot-24.png
Rename the default pgroup
SafeMode-SRA-ScreenShot-25.png
SafeMode-SRA-ScreenShot-26.png
Destroy the renamed default pgroup

Once that is complete the recovered volumes should be placed in the FlashArray protection group/s that meet the protection requirements.  Once the volumes are in the right pgroups take manual snapshots to be replicated as a sync once for the re-protect process.  Once that is complete then a device discovery in SRM for SRA to ensure the devices are protected.

SafeMode-SRA-ScreenShot-27.png
Find the recovered volume/s and add them to a protection group
SafeMode-SRA-ScreenShot-28.png
SafeMode-SRA-ScreenShot-29.png
Create and replicate a manual re-protect snapshot
SafeMode-SRA-ScreenShot-30.png
SafeMode-SRA-ScreenShot-31.png
From SRM run a device discovery and confirm the datastore/volume show up in the device list
SafeMode-SRA-ScreenShot-32.png

Now that we have followed all of these steps, running a test failover or recovery workflow will succeed.  There will still need to be manual steps in order to complete a fail back and re-protect, but they will be the same as covered in the initial failover process.  

Recommendations for SafeMode + SRM + VMFS/RDMs

UPGRADE TO SRA 4.2.0+ for full SafeMode support OR:

Here are some high level recommendations when using SRM with SafeMode enabled on the FlashArray.

  • Prior to enabling SafeMode on the FlashArray, create the SRM protection groups and recovery plans
  • Prior to enabling SafeMode on the FlashArray, run Test Failover and Test Failover Cleanup workflows for the SRM recovery plans
  • Prior to enabling SafeMode on the FlashArray, have each FlashArray protection group created to meet the SLA needs on both source and target arrays
  • Plan for and expect many manual steps to complete workflows with SRM when SafeMode is enabled
  • Access to the FlashArray and manual workflows are required to use SafeMode with SRM for VMFS and RDMs

Site Recovery Manager with SafeMode and vVols

An awesome part with vVols SRM with SafeMode is that every workflow works as expected.  However, this doesn't mean the operations won't fail for vVols and SRM when SafeMode is enabled because VASA is no longer able to eradicate objects; this leads to elevated object count on both the source and target arrays.  With this in mind, let's cover each workflow and what kind of impact SafeMode has on scale.

Test Failover

There shouldn't be any issues when running a test failover on the replication groups in the recovery plan.  VASA will create placeholder protection groups on the target FlashArray and will "recover" the volumes and place them in those pgroups but the replication and snapshot schedules are both disabled.  In the event that the storage policies rule sets have replication retention or interval higher than the default, then, at this time, VASA will be able to change the frequency to be higher than the default.  This is a bug in Purity and will be fixed in a future release.

Test Failover Cleanup

When a test failover cleanup is issued, VASA would normally destroy and eradicate each of the test failover volumes and volume groups. While the cleanup will succeed in SRM, the volumes on the array will not be eradicated, thus leading to elevated object count numbers.

Failover

The failover will work correctly and as expected.  The caveat with the failover is that VASA will be unable to disable the source protection groups that are failed over on the FlashArray.  This will cause snapshots to continue to be replicated after the failover is complete which will lead to higher volume snapshot object counts.

Re-Protect

Everything works as expected here in the event that the storage policy's rulesets for replication interval and retention are not higher than the default protection group schedules (again, this is a current Purity bug and will be fixed in future release).  

Recommendations for SafeMode + SRM + vVols

Here are some high level recommendations when wanting to use SRM with vVols when SafeMode is enabled.

  • Prior to enabling SafeMode on the array, ensure that all SRM protection groups and recovery plans are created
  • Prior to enabling SafeMode on the array, run through Test Failover workflows for each recovery plan
    • This will allow VASA to create Mappings for FlashArray protection groups and vVols replication groups on the recovery FlashArray
  • Always run a Test Failover and Test Failover cleanup ASAP after a successful recovery plan failover and re-protect
    • This is crucial to keeping object count minimal as the cleanup will destroy the re-used objects
  • Keep a very close eye to object count and object count limits on both the source and target FlashArrays