Overview of Stretched Storage in SRM
VMware Site Recovery Manager supports two main modes of array-based replication:
- Active/Passive: SRM coordinates with underlying array-based replication to create a copies of datastores or RDMs on the remote site. VMs are shut down and restarted on the target site. The Recovery Point Objective is not guaranteed to be zero. SRM attempts to synchronize any changes during a failover, but in the case of a site loss of the source it may not be possible. Recovery Time Objective is always non-zero, and the length depends on how much data needs to be synchronized and how large of a recovery occurs.
- Active/Active: SRM coordinates migrating VMs between vCenter environment. This implementation requires synchronous replication (all data is always in both sites) as well as active-active replication (the data is available in both sites at the same time). SRM will attempt to vMotion VMs to the remote vCenter if possible and if not will restart them on the second site otherwise. Recovery Point Objective is always zero, and Recovery Time Objective may be zero (generally in a migration or disaster avoidance scenario) or non-zero in certain disaster recovery events.
For Pure Storage, this means the use of ActiveCluster. ActiveCluster is enabled when a volume is placed in a pod and that pod is stretched to a second array. Any volume in a stretched pod is now both synchronously replicated and available in an active/active state (both FlashArrays can simultaneously service reads and writes for those volumes).
The 3.1 release of the Pure Storage SRA also supports recovery from a stretched pod to a third FlashArray. This does not leverage the stretched storage support in SRM as the recovery is an active/passive replication scenario. For details on that behavior see the following section.
For the purpose of this document, the following example environment will be used.
Two volumes in a stretched pod called srmPod.
The pod is stretched over two physical arrays, flasharray-m50-1 and flasharray-m50-2. On each of these volumes is a VMFS datastore, named srmDS-01-stretched and srmDS-02-stretched.
With the release of Site Recovery Manager (SRM) 8.5, SRM now supports Raw Device Mapping (RDM) devices in stretched protection groups. This means that Virtual Machines that have RDMs attached to them can be protected with SRM and recovered with SRM when using ActiveCluster.
However, with SRM 8.4 and lower RDMs are not supported.
While the example in this KB is initially configured with storage policies on an older version of SRM, here is an example of how configuring SRM protection groups with the type datastore group and array-based replication with VMFS and RDMs.
Configuring Protection for Stretched VMFS and RDMs with SRM 8.5 and higher
Configuring protection in SRM for ActiveCluster VMFS and RDM devices has never been easier with SRM 8.5 and higher. You no longer need to configure datastore tags or storage policies on VMs that need to be protected. Rather the configuration is done exactly like async protection. You create a SRM protection group and configure the protection group with datastore groups. Here are the steps to do that.
Create a new protection group
Chose the correct direction from the source vCenter where the VMs are located to the target vCenter where they will be recovered to.
The type of protection will be datastore groups and will leverage array-based replication.
Select the correct array pair that correlates to the right array manager pair.
Select the datastore group that has the VMs that need to be protected.
Note here that RDMs are listed as will any VMFS datastores that are being selected. Then the list of VMs is there.
That's it, the rest of the process is the same in that you choose the recovery plan and then complete the protect. From that point the SRM workflows are ran just as they would be with ActiveCluster. Without needing storage policies or storage policy mappings.
Using Storage Policy and Datastore Tags with SRM 8.4 and lower
Both datastores are tagged so that the VMs hosted on them can be given a correct storage policy for protection in SRM. For stretched storage, only storage-policy-based discovery is supported, for details, refer to the section Configuring Site Recovery Manager Tag-Based Storage Policy Discovery.
The VMs have a tag-based policy applied that ensure they stay on these datastores.
These two datastores then are in compliance with this policy.
The datastores are presented to both vCenters.
They are presented via paths to the FlashArray flasharray-m50-1 on vCenter-01 and through flasharray-m50-2 on vCenter-02.
The datastores (and storage policy called SRM-tag-m50_1-m50_2) is added to an SRM protection group called srmpg-01-stretched.
Therefore protecting all of the VMs on those datastores (and any ones that get added to that policy).
Note that for each datastore there will be one consistency group reported. So if there are two datastores, there will be two consistency groups and so on. This is because the FlashArray SRA does not advertise consistency groups back up to SRM so SRM puts each datastore in its own group. This provides the greatest flexbility of failover when it comes to granularity, but this also means it is not supported to have any VM span more than one datastore. Each VM must be on only one datastore to be protected by SRM and stretched storage.
Lastly, the storage policy assigned to the VMs on vCenter-01 is called SRM-tag-m50_1-m50_2 which maps to SRM-tag-m50_2-m50_1 on vCenter-02 and that mapping is configured in SRM.
When the VMs are recovered in vCenter-01 they will have the mapped policy applied.
While the configuration of the SRM protection for stretched VMs in 8.5 does not require storage policies, tags or categories, the rest of the workflows operate the exact same way. The rest of the KB can still be followed in a similar manner unless otherwise noted with 8.5+ or 8.4 and lower.
A test recovery is a slightly different concept with the SRM stretched storage feature, as the workflow from a migration/storage operation is not quite the same.
A recovery (more on this in a bit) is either:
- Cross vCenter vMotion
- Register and Power-on of VMs on target vCenter
In both cases there is no copy of the datastore-the VMs are recovered on the exact datastores they are on in the source vCenter. This makes it impossible to really test the entire work flow. Instead, the SRM test recovery is focused on VM recovery order, reconfiguration, placement, connectivity, relationships, and of course any scripts/customizations you might have in the recovery plan.
If you plan on leverage the Cross vCenter vMotion feature, you should attempt this process manually with a VM configured like the protected VM (stretched volume) as described in the next section as well as the official SRM test recovery process.
At this time, cross vCenter vMotion is not supported with RDMs in SRM 8.5.
Testing Cross vCenter vMotion
For specific requirements for this feature, refer to the following VMware KB.
Right-click on a test VM and choose Migrate.
Choose "Change compute resource only" and click Next.
Then choose a destination host or cluster.
It is important to pay attention to the Compatibility box. If there is a prerequisite missing it will be listed there. If there are no errors or warning, click Next. If there are warnings, check the VMware KB linked above for requirements. Common reasons are no vMotion vmkernel port on the target hosts, incorrect network names, incompatible CPUs, or a VM resource that is not present in the target site.
Complete the wizard and verify a successful vMotion.
Lastly, verify the networking is correct for the target VM. Perform the vMotion in the opposite direction to confirm backwards compatibility.
If the VM you tested is part of the SRM recover plan, it is important to remember to re-apply the storage policy to it when you vMotion it back. The vMotion process does not preserve the policies across vCenters.
Testing the SRM Recovery Plan
To test the recovery plan, click Test on the plan in SRM.
Unlike with active/passive storage the "Replicate recent changes to the recovery site" does not have any effect with stretched storage as the data is always in sync. Though if this recovery plan has SRM protection groups in it that do include datastores protected via asynchronous replication, it is applicable. For a purely stretched recovery plan though, it does not need to be selected (though it does no harm being enabled).
The test recovery process will create a copy of all of the volumes in the recovery plan with a prefix of the pod name and a suffix of "-puresra-testFailover".
The datastores will be mounted and resignatured with the prefix of "snap-XXXXXXXX-" in the name. This can be automatically in the SRM advanced settings discussed here Site Recovery Manager Advanced Options.
Verify the recovery plan result.
When done, click Cleanup in SRM.
The VMs and corresponding datastores will be removed and the volumes will be destroyed and eradicated on the FlashArray.
A recovery operation will attempt a vMotion if applicable of the virtual machines in the stretched protection group/recovery plan.
To verify a VM is eligible for vMotion, click on the SRM protection group and then the Virtual Machines tab. The VM will have Yes in the vMotion Eligible column.
For a given VM in the recovery plan, to disable or override a vMotion for a VM, click on the VM and choose Configure Recovery.
Deselect the Use vMotion for planned migration box if it is preferred for SRM to not attempt to vMotion that VM.
To start a recovery, click Run.
When the recovery wizard is initiated, you have one more option to disable vMotion for all VMs in the plan if you choose. The default is to attempt it for all eligible VMs.
If you choose to vMotion:
It is highly advisable to pre-connect the datastores on the target ESXi hosts prior to recovery or the recovery plan will hang until you connect and mount the datastores manually.
SRM will, if enabled, attempt to vMotion all applicable VMs. For all others, it will unregister them on the source vCenter, and then register and power them on within the target vCenter according to SRM resource mappings.
If the datastores are not presented prior to recovery and vMotion is enabled, the recovery plan will hang on the step Prepare stretched storage for VM migration at the protected site waiting for the storage as it will not be able to vMotion the VMs.
You can manually connect the datastores during the recovery and rescan the hosts to allow the process to progress. The ideal option though is to connect them prior to recovery.
If vMotion is not chosen:
If vMotion is deselected during the recovery, it is disabled for all VMs, or the environment does not support cross-vCenter vMotion, it is not required to have volumes preconnected to the target ESXi hosts. In this case, the SRA will connect them automatically during the Configure recovery site storage step.
If the datastores are not present and vMotion is enabled, the recovery plan will hang on the step Prepare stretched storage for VM migration at the protected site waiting for the storage as it will not be able to vMotion the VMs.
The datastores are not automatically attached until a later step in the recovery plan. If the datastores are not connected (and it is preferred to have the SRA connected them as needed), vMotion attempts must be disabled for the respective VMs, or the entire recovery plan. To ensure that vMotion is not attempted for an entire plan, deselect the option when starting the recovery.
Note that there is no operation on the FlashArray for the entire recovery. Verify the recovery.
Note that unlike an active/passive SRM recovery, the datastores remain mounted on the source vCenter.
The reprotect operation also does not interact with the storage-it just reverses the SRM protection group and recovery plan. To reprotect, click Reprotect in SRM.
Confirm the operation in the wizard and click Finish.
An SRM failback is identical in everyway to an original recovery for stretched storage. Ensure the proper resource mappings exist and the datastores are mounted in the target site to enable vMotion (if desired). Run the recovery plan normally to failback to the original vCenter.