Troubleshooting ActiveDR Reprotect Errors in Site Recovery Manager
A Site Recovery Manager Reprotect operation with ActiveDR ensures that the replication is synchronized between the source and target pods, and also resets the state of the volumes in it (via volume tags) so the environment is ready for a recovery operation in the future.
There are a few situations that can cause this operation to fail. This KB overviews them--Pure Storage is reviewing these situations and investigating options to allow the SRA to handle them automatically. This KB will be updated if and when improvements are made.
Extra Volume in Pod
Failed to reverse replication for failed over devices. Cannot process consistency group 'peer-of-e184134e-5a94-9f76-5c53-ae33e6df93d5' with role 'target' when expected consistency group with role 'promotedTarget'.
SRA command 'discoverDevices' failed. More than one type of devices found in the pod. Please verify that all the volumes are either untagged or tagged with same value in the "puresra" namespace.
As of the 4.0 release of the SRA it is not supported to have a volume that is not in use as a VMFS or a RDM in vCenter environment controlled by the SRM pair. Having volumes that are not in use, or are in use in a different vCenter environment, or in use in a non-VMware environment cannot be in the same pod as volumes controlled by SRM.
It is also not supported to provision a new datastore or RDM into the an ActiveDR pod between a recovery and a reprotect.
While violating these requirements will not prevent failover, they will prevent a successful test recovery cleanup, and the reprotect operation. A reprotect will fail with the above error message.
Or you will see this error in a device discovery:
In this case, there is an extra volume in the pod:
That was provisioned after the recovery was completed--this is evidenced by the fact that it does not have the puresra-failover tag on it:
Note that for this failure running a forced reprotect will not work--this process will only work for certain VMware-side failures, not SRA failures on device state changes.
But there are options, but it depends on what the issue is.
A non-VMware volume
If the volume(s) in question are not related to the VMware environment, you must move them out. As of Purity 6.0.0 you cannot move a volume out of a linked ActiveDR pod. So to move it out you have a few options:
- Copy it to a new volume outside of the pod. Delete the original and then connect the copied volume to the host using it--this will be disruptive.
- Disable replication of the ActiveDR pod, move the volume out, then re-enable replication.
- Create a new volume outside of the pod and use host-based tools to move the data to it and then destroy the original
- Destroy the volume--this is assuming of course the volume has no importance
Once one of these are complete you can then re-attempt reprotection.
A VMware volume
If the volume is a valid VMFS datastore or RDM and it is intended to be in the pod and the removal of it would be difficult or inconvenient it is possible to resolve this without following the steps for a non-VMware volume.
Let's say the 4th volume listed here is a VMFS that was created after recovery and before reprotect:
This causes the reprotect to fail. An option to allow the SRA to bypass this is to manually tag the volume. SSH into the FlashArray hosting the promoted pod.
Review the puresra-failover tags on the current promoted pod (replace <pod name> with the pod name):
purevol list <pod name>* --tag --namespace puresra --filter 'key="puresra-failover"'
Copy the value which will be a long UUID like 49e593f1-7498-2cd1-86dc-d1ca09f101ff.
Run:
purevol tag <pod name::volume name> --key puresra-failover --value <UUID> --namespace puresra --non-copyable
Replacing the <pod name::volume name> with the pod name and the name of the volume with the missing tag and then replace <UUID> with the value you copied above.
Now all of the volumes in the pod are tagged with puresra-failover:
Secondly, depending on when they are created, they are also likely missing the puresra-demoted tag. If the volume was present before the recovery it will have it, if it was created sometime after the recovery it will not.
To check, go to the array where the ActiveDR for this pair is promoted and run:
purevol list <pod name>* --tag --namespace puresra --filter 'key="puresra-demoted"'
If that volume does not have the demoted tag add it as well. Note that the UUID is DIFFERENT than the previous UUID. The value of the puresra-failover tag is the UUID of promoted pod. The value for the puresra-demoted tag is the UUID of the demoted pod.
Copy the value which will be a long UUID like e184134e-5a94-9f76-5c53-ae33e6df93d5.
Run:
purevol tag <pod name::volume name> --key puresra-demoted --value <UUID> --namespace puresra --non-copyable
Replacing the <pod name::volume name> with the pod name and the name of the volume with the missing tag and then replace <UUID> with the value you copied above.
Now all of the volumes in the pod are tagged with puresra-demoted as well.
First re-run a Device Discovery for that pod pair:
If you see this error another volume is missing a tag or you did not tag it correctly.
Verify your tags are uniform and repeat the discovery until there are no errors on that pod pair:
Now re-run the reprotect:
The process will run:
And complete. Note that after the reprotect you will need to edit the protection group and ensure the datastore was added and protected.
While this process will also work for non-VMware volumes to--this is just delaying the process to fix it. It is best to take care of it now if the volume is a non-VMware volume.
Target Pod is Demoted
Expected consistency group 'e184134e-5a94-9f76-5c53-ae33e6df93d5' not found in SRA's 'queryReplicationSettings' response.
During a reprotect, the SRA ensures that the environment is ready for recovery. The SRA only removes tags from the source pod--as that removal will propagate the tag removal to the volumes in the target pod. If the target pod happens to be promoted though the tags will not go away until the target pod is demoted and this will cause the target to be in an unexpected state in what is referred to as a promotedTarget by SRM. At the end of a reprotected it should be a demotedTarget. The SRA returns that status because not only is the pod promoted, but it also has the tags indicating the demotion:
This will cause an error during the last step of the reprotection (synchronize storage) with the following error:
The reprotection will still succeed but with this warning. You will also see an error in the device discovery:
Once you demote the target pod...
Run a device discovery...
The warning will go away:
Replication Link is Paused
There is a current known issue with the 4.0 release of the SRA with ActiveDR when the replication link is paused between the source and target pod.
The Synchronize Storage step will appear to hang at 100%:
Go to the either FlashArray in the ActiveDR pod pair, and navigate to the pod and in the Pod Replica Links box click the vertical ellipsis and choose Resume:
Confirm the resume operation:
The reprotect will then complete once synchronized.