Skip to main content
Pure Technical Services

SRM User Guide: Pod-based Periodic Replication SRM Workflow Behavior

Pod-based periodic replication on the FlashArray is replication managed by FlashArray protection groups that originate in a pod (stretched or unstretched). The FlashArray SRA supports protection group replication that is FlashArray to FlashArray. For SRM workflows this consists of test recovery, recovery, reprotect, and failback.

Prerequisites

In order for SRM to run any operations on FlashArray-hosted datastores/RDMs there are a few requirements:

  1. FlashArray SRA version 3.1 or later.
  2. Replication must be configured on those volumes
  3. Array managers must be configured
  4. Array pairs protecting those volumes must be enabled
  5. Volumes must be discovered by SRM
  6. The volumes must be added to a protection group.
  7. The protection group must be in at least one recovery plan.
  8. Hosts and host groups must be pre-created on the recovery FlashArray for the target recovery clusters.

There is no need to pre-create any recovery volumes or pre-connect anything to the recovery hosts on the FlashArray(s). The SRA automates the process of creating recovery volumes and connecting them to the appropriate host resources.

Pure Storage recommends always recovering all datastores in a pod together. While you can separate out different groups of datastores from the same pod into different SRM protection groups/recovery plans for testing, it is advisable to perform actual recoveries all together of everything in that pod. The reason for this is that for failback the pod must be un-stretched. If there are still applications running within the pod the unstretch operation will adversely affect their availability until the recovery is complete. For this reasons, it is best to put non-SRM protected volumes in their own pod, or any datastores that you might want to fail over independently in their own pod.

Example Environment

For the proceeding recovery workflows the following environment was used:

Four VMFS datastores, more details of the datastore(s) can be viewed if you have the FlashArray vSphere Plugin installed:

clipboard_ea58261590a9327593dea7f55370d52a8.png

 

These are hosted on four corresponding volumes in a pod called srmPod stretched across two arrays, flasharray-m50-1 and flasharray-m50-2:

clipboard_e0aa9512ad77f03618046bd748530ae91.png

These four volumes are then periodically replicated via a FlashArray protection group called srmPod::srmPodPG01 to a third array called flasharray-m20-1:

clipboard_e5626d4e07b85de9068ff2485d7baf5f2.png

These volumes are discovered as replicated by SRM in the array pair srmPod <-> flasharray-m20-1:

clipboard_ef77e0fd15a0d3351bdf8ab934b1f5f73.png

These datastores are in a SRM protection group called srm-pg01:

clipboard_eafd1c68b74aa452323862ff67914c768.png

Which is in an SRM recovery plan called srm-rp-01:

clipboard_e10fb4dc3f05248c026a45159552ab3cb.png

 

Test Recovery

Recovery to a third site from a pod means that the volumes are configured and protected differently depending on what site they are failed over to (the async target site or back to the pod-based site). A test recovery operation is also different from an array perspective. This section overview two scenarios:

  1. Test recovery to an asynchronous target site of VMs residing on datastores in pod 
  2. Test recovery back to a pod of VMs residing on datastores on an asychronous site.

Test Recovery from a Pod to Asynchronous Target Site

One of the primary benefits of Site Recovery Manager is the ability to test recovery plans--coordinating the failover of the replication and recovery of virtual machines without affecting the configuration, state, protection, or connectivity of the original source VMs.

Best Practice: Test recovery and test it often. Do it on a schedule as well as right after any changes in your replicated environment.

The high-level process of a recovery is:

  1. Issue a synchronization of the relevant FlashArray volumes to the target.
  2. Create new FlashArray volumes from the replicated snapshots on the target array.
  3. Connect the volumes to the appropriate hosts and/or host groups.
  4. Rescan the target cluster(s).
  5. Resignature and mount the datastores.
  6. Power-on the VMs and configure them according to the recovery plan.

To initiate a test, click the Test button in SRM:

clipboard_ecec663600e1efa4ca26a530411f6bef1.png

A wizard will appear, with a default option called Replicate recent changes to recovery site selected:

clipboard_ebdedc0409643adff5fe7648ab4b39a57.png

If this is selected, the SRA will create a new replication point-in-time. If you de-select it, the SRA will just use the latest point-in-time found for each volume. Note though that during failover, the SRA will always still use the latest available point-in-time. So if the SRA creates a new one for the test, and between the completion of that point-in-time and before creating the recovery volumes a new point-in-time is created, that new one will be used for recovery.

Click Next.

clipboard_e0ce46d814323fb8c4a1a6b76e7247faf.png

Confirm the test and click Finish.

If you kept "replicate recent changes" selected, the SRA will initiate a new point-in-time on the target array and will give the protection group snapshot created a name including a UUID and the suffix of -puresra:

clipboard_e35933b06fd772e70bb30778a0fd7fa81.png

The SRA will create a new replication point-in-time and will apply the retention policy specified in the protection group to it. So the protection group snapshot will be destroyed and eradicated according to the schedule. If preferred, you can manually destroy/eradicate it earlier than the retention policy dictates.

The SRA will then create one new volume for each protected volume on the target array with the original name of the volume plus a suffix of -puresra-testFailover:

clipboard_e097a337e57c5608d412a9bcc5d54720a.png

Note that the recovery volumes will not be in a pod, since periodic replication does not support a pod as a target, the volumes are replicated to and recovered in the root of the array (not in any pod).

They will then automatically be connected to the appropriate hosts and/or host groups. SRM will then rescan the cluster and the datastores will be resignatured and mounted:

clipboard_ed647a64a5e51a85e1e8367f3e89b647e.png

Note that the datastores will have the prefix applied to their names of "snap-XXXXXXXX" by SRM. This can be automatically removed by enabling the advanced SRM setting described in Site Recovery Manager Advanced Options

The VMs will be registered and configured according to your recovery plan.

clipboard_e436d256a0668f068fdcfd57fa677cfd9.png

Do not rename the test failover volumes during the test--this will cause them to not be cleaned up and you will need to destroy them manually.

Test Recovery from an Asynchronous Target Site to a Pod

One of the primary benefits of Site Recovery Manager is the ability to test recovery plans--coordinating the failover of the replication and recovery of virtual machines without affecting the configuration, state, protection, or connectivity of the original source VMs.

Best Practice: Test recovery and test it often. Do it on a schedule as well as right after any changes in your replicated environment.

The workflow is after a recovery plan has been failed over that included datastores in a pod to the asynchronous third site and then reprotected. This is a test of a recovery back to the original site to a pod.

The high-level process of a test recovery is: 

  1. Issue a synchronization of the relevant FlashArray volumes to the target.
  2. Create new FlashArray volumes from the replicated snapshots on the target array.
  3. Move them into the pod
  4. Connect the volumes to the appropriate hosts and/or host groups.
  5. Rescan the target cluster(s).
  6. Resignature and mount the datastores.
  7. Power-on the VMs and configure them according to the recovery plan.

To initiate a test, click the Test button in SRM:

clipboard_ecec663600e1efa4ca26a530411f6bef1.png

A wizard will appear, with a default option called Replicate recent changes to recovery site selected:

clipboard_ebdedc0409643adff5fe7648ab4b39a57.png

If this is selected, the SRA will create a new replication point-in-time. If you de-select it, the SRA will just use the latest point-in-time found for each volume. Note though that during failover, the SRA will always still use the latest available point-in-time. So if the SRA creates a new one for the test, and between the completion of that point-in-time and before creating the recovery volumes a new point-in-time is created, that new one will be used for recovery.

Click Next.

clipboard_e0ce46d814323fb8c4a1a6b76e7247faf.png

Confirm the test and click Finish.

For a test recovery to succeed, the target pod must be unstretched first. If it is not the synchronization step will fail with the following error:

clipboard_e36059642b5ba493065511b45213638d3.png

Unstretch and try again.

 The synchronization step will create a new point-in-time (if "Replicate recent changes" was selected) with a name consisting of a UUID and "-puresra":

clipboard_e8be8f3b4fb73512c5f83d7ca9002e774.png

The test will create new volumes from the replicated snapshots with the suffix "-puresra-testFailover":

clipboard_e2bd822c77e5d2ed851ba890ea3bcdd7f.png

These new volumes will be placed in the pod. Note, as seen in the above screenshot, if the original volumes with the suffix of "-puresra-demoted" exist they will not be re-used for the test--new volumes will still be created. For an actual recovery though these volumes (if present) will be used instead of creating new ones.

To test the environment, you can re-stretch the pod once the test recovery completes, but it is important to un-stretch it prior to the cleanup operation of the test to allow for it to be re-run and for an eventual possible actual recovery operation.

The detastores will be connected to the host/host group(s):

clipboard_ecaf4abe39cc12250e651a02e63f9241a.pngOnce complete, SRM will resignature and mount them:

clipboard_e62a7ea1b343623d53a3b6e6ea8971cc6.png

The resignature process will add a name prefix to them of snap-XXXXXXXX. SRM can be configured to automatically remove the suffix through advanced configuration documented here.

Test Recovery Cleanup

Once you have verified the test, click the cleanup button.

clipboard_e4f3bcd26da2c07d25d756e98ed1b57aa.png

The cleanup process will power-off the VMs and un-register them. The datastores will be unmounted and detached from the hosts. The SRA will then disconnect the volume from all hosts, destroy it and then eradicate it.

This resets the process to the original state. The only object that will remain through a cleanup is any point-in-time protection group snapshots that were created by the test. Those will be destroyed according to retention.

Recovery & Reprotect

Recovery to a third site from a pod means that the volumes are configured and protected differently depending on what site they are failed over to (the async target site or back to the pod-based site). A recovery operation is also different from an array perspective.

This section overview four scenarios:

  1. Recovery to an asynchronous target site of datastores in a pod 
  2. Reprotection of datastores that have been recovered an asynchronous site
  3. Recovery back to a pod of datastores that are at an asynchronous site.
  4. Reprotection of datastores that have been recovered to a pod from an asynchronous site

Recovery from a Pod to an Asynchronous Target Site

Site Recovery Manager offers two main modes of failover: Planned Migration and Disaster Recovery. A planned migration will fail if any problems are encountered at the source or target site. A disaster recovery operations will tolerate up to a full failure of the source site resources and still recover the virtual machines. It is recommended to run a planned migration operation if possible, as this will ensure the cleanest failover of the environment. Furthermore, if a disaster recovery event is run, it is likely that manual cleanup of the source site will be required once resources are back online.

The high-level process of a recovery is:

  1. Issue a synchronization of the relevant FlashArray volumes to the target.
  2. Shutdown the production side virtual machines, unregister the VMs, and unmount the datastores.
  3. Synchronize the relevant FlashArray volumes again to the target.
  4. Create new FlashArray volumes from the replicated snapshots on the target array.
  5. Connect the volumes to the appropriate hosts and/or host groups.
  6. Rescan the target cluster(s).
  7. Resignature and mount the datastores.
  8. Power-on the VMs and configure them according to the recovery plan.

To start a recovery, click on the Run button on the recovery plan:

clipboard_ea6fa586556db0d79c1b5bd1254b7c978.png

Confirm the type of the recovery and the details of the operation and click Next then Finish.

clipboard_ea9257bab445db3c45b85b3a72c74101f.png

The data will be synchronized twice to the target FlashArray. Once before the VMs are shutdown, and once after:

clipboard_e61c116e6275ccae5f757721c64afbe1d.png

 

On the corresponding FlashArray protection groups you can see a new protection group snapshot created for each synchronization with a -puresra suffix (preceded by a random UUID) in the snapshot name.

clipboard_e0ca050d20f11de191b70427edfce6914.png

The second point-in-time will likely be the one used for recovery, but if a new point-in-time is created between the second synchronization and the subsequent step, the latest one will be used.

Upon the step called Configure recovery site storage, the replicated snapshots will be copied to new FlashArray volumes on the target FlashArray.

The volumes will be named with the same name as their source with a suffix of -puresra-Failover added:

clipboard_ef566b86fc2876cabe5ae549ad240b5b7.png

The original source volumes will be disconnected from their hosts and also renamed at this point. The SRA will add the suffix of -puresra-demoted to those volume names:

clipboard_e4d3ae8da8076e403cdeedd4a7c350057.png

It is recommended to not delete or rename the source volumes (the volumes with -puresra-demoted in the name) after a failover and prior to a reprotect. If a volume is renamed during this windows, the SRA will not be able to find the original protection of the source volumes. Therefore it will just create a default protection group called PureSRADefaultProtectionGroup with replication enabled back to the source array.

The device discovery screen will show the device pair(s) as Failover Complete.

clipboard_e5a9d23afcc76df6eeda8277d8cf8d40f.png

The recovery volumes are connected to appropriate hosts or host group on the recovery site and SRM will resignature and mount them. The resignature process will add a name prefix to them of snap-XXXXXXXX. SRM can be configured to automatically remove the suffix through advanced configuration documented here.

clipboard_e2f98d22e50ee77e572d20dc86e417b6d.png

The virtual machines will be registered, configured, and then powered-on according to the recovery plan.

clipboard_e67a1a1eb89f38ef98d30b5a6b0a7e07e.png

Reprotection of an Asynchronous Target Site

A reprotect operation automates the replication of failed over volumes back to the original FlashArray. In order to run reprotect, a fully successful recovery must occur. If the disaster recovery operation was executed and skipped multiple steps due to failures, a reprotect might not be possible. In this case manual reprotection might be required--which is essentially the same process as setting up replication for the first time. If a reprotect fails, try again with the Force Cleanup option checked (it will only appear if a reprotect has failed once).

Once a recovery operation has completed, it is recommended to run the reprotect as soon as possible. This will ensure the data being generated on the now production site is being protected.

The reprotect operation does the following things:

  1. Sets up replication on the FlashArray(s) that are now running the VMs
  2. Reverses the SRM protection groups and recovery plans
  3. Initiates a synchronization of the data.

To start a reprotect, click on the desired SRM recovery plan and click Reprotect.

clipboard_ec646e095335908f454c7154447016fb2.png

Confirm the action and click Next, then Finish.

clipboard_e70d00dd534f903c90cbc52814c5e1114.png

 

Note, prior to reprotect, if the original pod was stretched, it must be unstretched. This is due to the fact that protection groups cannot replicate back into a pod, and volumes cannot be moved into a stretched pod. In order to make sure that a failback will work, the SRA ensures that the original pod has been unstretched first. If you did not unstretch it, you will see the following error in SRM:

clipboard_ef01f0c1cf99eeecb04317b5326597ab2.png

It is important to understand that the SRA will never unstretch or stretch a pod back for you--you must do this yourself.

clipboard_e487993f5a3e167f9300a1d62a9f82e8f.png

You can unstretch from either array--just ensure that array is properly configured as a target in the SRM array managers.

The FlashArray will then look for the FlashArray protection group or groups of the source volume(s).

The process follows these rules:

  • The SRA will look for the original source volume on the pod, if it cannot find it, the SRA will setup a new protection group on the asynchronous target FlashArray--see below for details. The source volume is identified via the volume name--so if it has been renamed on the source the lookup will fail.
    • Protection groups on the pod that include the original source volume will be created on the asynchronous target FlashArray even if they do not have replication enabled and/or replicate to a different array
    • Protection group name matching is not case-sensitive. So if the pod has a protection group called srm-pg and the asynchronous target FlashArray has one called SRM-PG, they will be considered the same and no new group will be created.
    • For protection groups created by the SRA, it will match the replication and local snapshot policy.
    • Protection groups that are created by the SRA during reprotect will only add the original FlashArray as a replication target--no matter how many targets were in the original protection group. If you would like the created protection group to also replicate to other arrays, add them manually as targets later.
  • If the original source volume in the pod is in more than one protection group the SRA will re-create all protection groups on the asynchronous target FlashArray.
  • If a protection group with the name already exists on the asynchronous target FlashArray the SRA will not re-create the group and will put the volume in that identified group.
    • If a pre-existing protection group policy name does not match the original source protection group policy it will still be used. The SRA will not update the identified pre-existing protection group with the policy on the original array

Once the protection has been re-created on the now production side, the original source volume(s) will be removed from protection groups that replicate to the now production array. If the volume is in protection groups that do not replicate to the now production array, the volume will be left in those.

The SRA will re-create (or re-use) the protection groups on the now production FlashArray. Newly-created (and potentially pre-existing) protection groups will be replicating back to the FlashArray where the pod currently exists. It is recommended to manually add the 2nd FlashArray (the FlashArray which the pod was unstretched from prior to reprotect) to these protection groups as well as a replication target.

The pod used to be on flasharray-50-1 and flasharray-m50-2:

clipboard_e2e919d57225e3285f17a33a13e8255cc.png

And was unstretched to only exist on flasharray-m50-1:

clipboard_edf86eec678d5cb7d32676f8a7622cefa.png

The protection group that was created on the target (now production) FlashArray only will have the array that hosts the pod currently, in this case flasharray-m50-1:

clipboard_e036ce808102acc53631355ec264731c2.png

So it is recommended to manually add flasharray-m50-2 as a target as well:

clipboard_ec1275cd0701d78711858a3ce54cf3104.png

The reason for this recommendation is that if the 3rd array is replicating snapshots back to both arrays, both arrays will be seeded with the data. So when the original pod is stretched back to the 2nd array (in this case flasharray-m50-2) it will take much less time as most of the data is already there via periodic "seeding" from the protection group. Note that this is a recommendation, not a requirement.

If the SRA cannot find the original source volume (and therefore cannot identify the correct protection groups) the SRA will instead create a protection group with the default replication policy with the name PureSRADefaultProtectionGroup replicating back to the original array. You may edit and change (or even remove) this protection group as desired afterwards. Just ensure that the desired volumes are still replicated by a protection group.

clipboard_eabc7f1c840c54f86930d6966a015d1b0.png

The reprotect operation will then complete swapping directions of the SRM objects and reconfiguring protection of the virtual machines in the recovery plan.

clipboard_eab0a419a6a5235cd4e26fc8bb47cee69.png

The reprotect operation will rename the volume on the target array to remove the applied suffix of -puresra-failover

clipboard_e9eaae363c051f213155cfdf0d4b7f86f.png

If failures occur, the Force Cleanup option will become available--it is highly encouraged to not resort to that option immediately. It is advised to attempt figure out and resolve the underlying problem and re-run the reprotect without Force Cleanup selected until successful. 

The main reasons that a reprotect could fail are often in the VMware environment (placeholders aren't there, mappings are incorrect or missing, etc). Though some FlashArray failures can cause this too:

  • Replication connection does not exist back to the array. If so, re-create this connection.
  • Array managers are configured incorrectly
  • Original source volume was renamed. It should be in the form of <volume name>-puresra-demoted. If it is not, rename it back. You will see an error like: SRA command 'prepareReverseReplication' failed for device '<volume name>'. Cannot find the volume <volume name> on the array <arrayname>.
    Please make sure that replication setup is correct"
    . If it fails in the prepareReverseReplication phase likely the source volume has been renamed manually or destroyed. Either rename it back, or if it is gone, run an SRM device discovery and then re-attempt the reprotect without force cleanup checked. Only if that fails, should you then retry with force cleanup checked.
  • The source pod was renamed. If so, rename it back to the original name, or remove the SRM protection group and create a new one.
  • The pod is stretched across two arrays. Unstretch it from one of the arrays. If you do not want to do so, remove the SRM protection group and create a new one.
  • Target volume was renamed. It should be in the form of <volume name>-puresra-failover. If it is not, rename it back. You will see an error like: SRA command 'reverseReplication' failed for device '<volume name>'''. Cannot find the volume '<volume name>' on the array <arrayname>.
    • If this is the case, fix the name of the volume AND ensure that the original volume is still in a protection group replicating to the target. The reprotect operation will have removed it from any replication groups at this stage causing device discovery to fail.
  • A volume exists with the original name but no suffix. If the original volume was srm-DS1 and it was failed over to the 2nd array, the failover volume will be called srm-DS1-puresra-failover. If there is a volume on the failover array called srm-DS1 already the initial recovery will not fail, but the reprotect will as it will try to rename the volume from srm-DS1-puresra-failover to srm-DS1 which will fail because there is already a volume with that name. While this is unlikely to occur, it could. You will see an error upon reprotect like: Failed to reverse replication for device 'peer-of-53f027a7-828d-4b4d-a3a8-d4b2c8364507:srmDS-08'. SRA command 'reverseReplication' failed for device 'peer-of-53f027a7-828d-4b4d-a3a8-d4b2c8364507:srmDS-08'. Could not rename failed over volume srmDS-08 to peer-of-53f027a7-828d-4b4d-a3a8-d4b2c8364507:srmDS-08 on array flasharray-m20-1. Note that the volume might be in the destroyed volume folder awaiting eradication if you cannot see it.
     

Re-creating a Destroyed Pod Prior to a Failback

If prior to a failback an error is encountered in array discovery such as the following:

clipboard_ebd1102589fd2360edc259496c8682d33.png

One of two things have likely happened:

  1. The pod was destroyed. Check the "destroyed" pod list to see if it was destroyed recently. If so, restore it.
  2. The pod was renamed. In this case, rename it back, or create a new pod with the correct name.

In the case of the first situation, follow these instructions:

You should re-create the pod with that name (in this case srmPod) on any array configured in an array manager that has a replication connection back to the current production array.

clipboard_e016543e25f5ff5295b674daf04f01254.png

Once re-created, array discovery will succeed.

clipboard_ee5f006d954adb7af83ce8bf6e9ba1685.png

A failback to the re-created pod can now occur and the recovered volumes will be created in the pod:

clipboard_ee4c55bd6e76ebed6b3c1ae728de4efa6.png

 

 

 

Recovery from an Asynchronous Site to a Pod-based Site

Site Recovery Manager offers two main modes of failover: Planned Migration and Disaster Recovery. A planned migration will fail if any problems are encountered at the source or target site. A disaster recovery operations will tolerate up to a full failure of the source site resources and still recover the virtual machines. It is recommended to run a planned migration operation if possible, as this will ensure the cleanest failover of the environment. Furthermore, if a disaster recovery event is run, it is likely that manual cleanup of the source site will be required once resources are back online.

The high-level process of a recovery is:

  1. Issue a synchronization of the relevant FlashArray volumes to the target.
  2. Shutdown the production side virtual machines, unregister the VMs, and unmount the datastores.
  3. Synchronize the relevant FlashArray volumes again to the target.
  4. Create new FlashArray volumes from the replicated snapshots on the target array.
  5. Move the volumes into the pod
  6. Connect the volumes to the appropriate hosts and/or host groups.
  7. Rescan the target cluster(s).
  8. Resignature and mount the datastores.
  9. Power-on the VMs and configure them according to the recovery plan.

To start a recovery, click on the Run button on the recovery plan:

clipboard_ea6fa586556db0d79c1b5bd1254b7c978.png

Confirm the type of the recovery and the details of the operation and click Next then Finish.

clipboard_e0bf19c58cf73d437dfdfdd5f3e4bc578.png

If the pod is stretched, the synchronization will fail. Unstretch the pod and try again:

clipboard_e277dd4d7891770e06ebfdb46190b392a.png

The data will be synchronized twice to the target FlashArray. Once before the VMs are shutdown, and once after:

clipboard_e61c116e6275ccae5f757721c64afbe1d.png

 

On the corresponding FlashArray protection groups on the array currently hosting the pod, you can see a new protection group snapshot created for each synchronization with a name of a random UUID and "-puresra".

clipboard_e149bdbd8e0f40688733f1b4037ff1d91.png

The second point-in-time will likely be the one used for recovery, but if a new point-in-time is created between the second synchronization and the subsequent step, the latest one will be used.

Upon the step called Configure recovery site storage, the replicated snapshots will be copied to new FlashArray volumes on the target FlashArray.

The volumes will be named with the same name as their source with a suffix of -puresra-Failover added. If the original volumes are still there with a suffix of "-puresra-demoted" those existing volumes will be re-used.

clipboard_e0304fa605e0f7a9511424146452cd720.png

 

The original source volumes will be disconnected from their hosts and also renamed at this point. The SRA will add the suffix of -puresra-demoted to those volume names:

clipboard_eaf1580396c1c4313d583f63af07cc3b9.png

It is recommended to not delete or rename the source volumes (the volumes with -puresra-demoted in the name) after a failover and prior to a reprotect. If a volume is renamed during this windows, the SRA will not be able to find the original protection of the source volumes. Therefore it will just create a default protection group called PureSRADefaultProtectionGroup with replication enabled back to the source array.

The device discovery screen will show the device pair(s) as Failover Complete.

clipboard_e43e50973c1dd002bcab2f6bc397e9eb8.png

The recovery volumes are connected to appropriate hosts or host group on the recovery site and SRM will resignature and mount them. The resignature process will add a name prefix to them of snap-XXXXXXXX. SRM can be configured to automatically remove the suffix through advanced configuration documented here.

clipboard_e4ed5c0ffa8773d7771301c6cc947878b.png

The virtual machines will be registered, configured, and then powered-on according to the recovery plan.

clipboard_e67a1a1eb89f38ef98d30b5a6b0a7e07e.png

Reprotection of a Pod-based Site

Once the volumes have been successfully recovered into a pod, the pod can be immediately re-stretched if desired to a 2nd array. This can be done before or after a reprotect operation.

clipboard_e06e92bb6c7ac14b76b82fd8dbe4b0d48.png

If the pod was originally stretched it is recommended to re-stretch it prior to reprotection, to ensure the highest protection level provided by ActiveCluster is enabled as soon as possible.

clipboard_eb5b3b555b2c14335bcc9f2c07519a4b7.png

To start a reprotect, click Reprotect:

clipboard_e9e494eecbce2802acd1bdaa597e77b7d.png

Complete the wizard to confirm the reprotection:

clipboard_ee2ab8554aaf71d66284a84494d483c1c.png

The process follows these rules:

  • The SRA will look for the original source volumes on the asynchronous target array, if it cannot find it, the SRA will setup a new protection group on the pod-owning FlashArray--see below for details. The source volume is identified via the volume name--so if it has been renamed on the source the lookup will fail.
    • Protection groups on the asynchronous target array that include the original source volume will be created on the pod-owning FlashArray even if they do not have replication enabled and/or replicate to a different array
    • Protection group name matching is not case-sensitive. So if the asynchronous target array has a protection group called srm-pg and the pod-owning FlashArray has one called SRM-PG, they will be considered the same and no new group will be created.
    • For protection groups created by the SRA, the SRA will match the replication and local snapshot policy.
    • Protection groups that are created by the SRA during reprotect will only add the originating FlashArray as a replication target--no matter how many targets were in the original protection group. If you would like the created protection group to also replicate to other arrays, add them manually as targets later.
  • If the original source volume on the asynchronous target array is in more than one protection group, the SRA will re-create all protection groups on the pod-owning FlashArray.
  • If a protection group with the name already exists on the pod-owning FlashArray the SRA will not re-create the group and will put the volume in that identified group.
    • If a pre-existing protection group policy name does not match the original source protection group policy it will still be used. The SRA will not update the identified pre-existing protection group with the policy on the original array

If the SRA cannot find the original source volume (and therefore cannot identify the correct protection groups) or those volumes are in no protection groups on the asynchronous target array, the SRA will instead create a protection group with the default replication policy with the name PureSRADefaultProtectionGroup replicating back to the original array.

clipboard_eca0c756a5db1c013f2de24116b0b8f15.png

You may edit and change (or even remove) this protection group as desired afterwards. Just ensure that the desired volumes are still replicated by a protection group.

clipboard_e79bfb12e19bdb23b7074bcdf7bc5d2d4.png

The reprotect operation will rename the volume on the target array to remove the applied suffix of -puresra-failover

clipboard_ee90cb1fbfeb9b8296b2f8710ce3fc322.png

If failures occur, the Force Cleanup option will become available--it is highly encouraged to not resort to that option immediately. It is advised to attempt figure out and resolve the underlying problem and re-run the reprotect without Force Cleanup selected until successful. 

The main reasons that a reprotect could fail are often in the VMware environment (placeholders aren't there, mappings are incorrect or missing, etc). Though some FlashArray failures can cause this too:

  • Replication connection does not exist back to the asynchronous target array. If so, re-create this connection.
  • Array managers are configured incorrectly
  • Volumes on the asynchronous target were renamed. It should be in the form of <volume name>-puresra-demoted. If it is not, rename it back. You will see an error like: SRA command 'prepareReverseReplication' failed for device '<volume name>'. Cannot find the volume <volume name> on the array <arrayname>.
    Please make sure that replication setup is correct"
    . If it fails in the prepareReverseReplication phase likely the source volume has been renamed manually or destroyed. Either rename it back, or if it is gone, run an SRM device discovery and then re-attempt the reprotect without force cleanup checked. Only if that fails, should you then retry with force cleanup checked.
  • Target volume was renamed. It should be in the form of <volume name>-puresra-failover. If it is not, rename it back. You will see an error like: SRA command 'reverseReplication' failed for device '<volume name>'''. Cannot find the volume '<volume name>' on the array <arrayname>.
    • If this is the case, fix the name of the volume AND ensure that the original volume is still in a protection group replicating to the target. The reprotect operation will have removed it from any replication groups at this stage causing device discovery to fail.
  • A volume exists with the original name but no suffix. If the original volume was srm-DS1 and it was failed over to the 2nd array, the failover volume will be called srm-DS1-puresra-failover. If there is a volume on the failover array called srm-DS1 already the initial recovery will not fail, but the reprotect will as it will try to rename the volume from srm-DS1-puresra-failover to srm-DS1 which will fail because there is already a volume with that name. While this is unlikely to occur, it could. You will see an error upon reprotect like: Failed to reverse replication for device 'peer-of-53f027a7-828d-4b4d-a3a8-d4b2c8364507:srmDS-08'. SRA command 'reverseReplication' failed for device 'peer-of-53f027a7-828d-4b4d-a3a8-d4b2c8364507:srmDS-08'. Could not rename failed over volume srmDS-08 to peer-of-53f027a7-828d-4b4d-a3a8-d4b2c8364507:srmDS-08 on array flasharray-m20-1. Note that the volume might be in the destroyed volume folder awaiting eradication if you cannot see it.