Overview of Stretched Storage in SRM
VMware Site Recovery Manager supports two main modes of array-based replication:
- Active/Passive: SRM coordinates with underlying array-based replication to create a copies of datastores or RDMs on the remote site. VMs are shut down and restarted on the target site. The Recovery Point Objective is not guaranteed to be zero. SRM attempts to synchronize any changes during a failover, but in the case of a site loss of the source it may not be possible. Recovery Time Objective is always non-zero, and the length depends on how much data needs to be synchronized and how large of a recovery occurs.
- Active/Active: SRM coordinates migrating VMs between vCenter environment. This implementation requires synchronous replication (all data is always in both sites) as well as active-active replication (the data is available in both sites at the same time). SRM will attempt to vMotion VMs to the remote vCenter if possible and if not will restart them on the second site otherwise. Recovery Point Objective is always zero, and Recovery Time Objective may be zero (generally in a migration or disaster avoidance scenario) or non-zero in certain disaster recovery events.
For Pure Storage, this means the use of ActiveCluster. ActiveCluster is enabled when a volume is placed in a pod and that pod is stretched to a second array. Any volume in a stretched pod is now both synchronously replicated and available in an active/active state (both FlashArrays can simultaneously service reads and writes for those volumes).
The 3.1 release of the Pure Storage SRA also supports recovery from a stretched pod to a third FlashArray. This does not leverage the stretched storage support in SRM as the recovery is an active/passive replication scenario. For details on that behavior see the following section.
For the purpose of this document, the following example environment will be used.
Two volumes in a stretched pod called srmPod.
The pod is stretched over two physical arrays, flasharray-m50-1 and flasharray-m50-2. On each of these volumes is a VMFS datastore (it should be noted that RDM as not supported by SRM for usage with stretched storage), named srmDS-01-stretched and srmDS-02-stretched.
Both datastores are tagged so that the VMs hosted on them can be given a correct storage policy for protection in SRM. For stretched storage, only storage-policy-based discovery is supported, for details, refer to the section Configuring Site Recovery Manager Tag-Based Storage Policy Discovery.
The VMs have a tag-based policy applied that ensure they stay on these datastores.
These two datastores then are in compliance with this policy.
The datastores are presented to both vCenters.
They are presented via paths to the FlashArray flasharray-m50-1 on vCenter-01 and through flasharray-m50-2 on vCenter-02.
The datastores (and storage policy called SRM-tag-m50_1-m50_2) is added to an SRM protection group called srmpg-01-stretched.
Therefore protecting all of the VMs on those datastores (and any ones that get added to that policy).
Note that for each datastore there will be one consistency group reported. So if there are two datastores, there will be two consistency groups and so on. This is because the FlashArray SRA does not advertise consistency groups back up to SRM so SRM puts each datastore in its own group. This provides the greatest flexbility of failover when it comes to granularity, but this also means it is not supported to have any VM span more than one datastore. Each VM must be on only one datastore to be protected by SRM and stretched storage.
Lastly, the storage policy assigned to the VMs on vCenter-01 is called SRM-tag-m50_1-m50_2 which maps to SRM-tag-m50_2-m50_1 on vCenter-02 and that mapping is configured in SRM.
When the VMs are recovered in vCenter-01 they will have the mapped policy applied.
A test recovery is a slightly different concept with the SRM stretched storage feature, as the workflow from a migration/storage operation is not quite the same.
A recovery (more on this in a bit) is either:
- Cross vCenter vMotion
- Register and Power-on of VMs on target vCenter
In both cases there is no copy of the datastore-the VMs are recovered on the exact datastores they are on in the source vCenter. This makes it impossible to really test the entire work flow. Instead, the SRM test recovery is focused on VM recovery order, reconfiguration, placement, connectivity, relationships, and of course any scripts/customizations you might have in the recovery plan.
If you plan on leverage the Cross vCenter vMotion feature, you should attempt this process manually with a VM configured like the protected VM (stretched volume) as described in the next section as well as the official SRM test recovery process.
Testing Cross vCenter vMotion
For specific requirements for this feature, refer to the following VMware KB.
Right-click on a test VM and choose Migrate.
Choose "Change compute resource only" and click Next.
Then choose a destination host or cluster.
It is important to pay attention to the Compatibility box. If there is a prerequisite missing it will be listed there. If there are no errors or warning, click Next. If there are warnings, check the VMware KB linked above for requirements. Common reasons are no vMotion vmkernel port on the target hosts, incorrect network names, incompatible CPUs, or a VM resource that is not present in the target site.
Complete the wizard and verify a successful vMotion.
Lastly, verify the networking is correct for the target VM. Perform the vMotion in the opposite direction to confirm backwards compatibility.
If the VM you tested is part of the SRM recover plan, it is important to remember to re-apply the storage policy to it when you vMotion it back. The vMotion process does not preserve the policies across vCenters.
Testing the SRM Recovery Plan
To test the recovery plan, click Test on the plan in SRM.
Unlike with active/passive storage the "Replicate recent changes to the recovery site" does not have any effect with stretched storage as the data is always in sync. Though if this recovery plan has SRM protection groups in it that do include datastores protected via asynchronous replication, it is applicable. For a purely stretched recovery plan though, it does not need to be selected (though it does no harm being enabled).
The test recovery process will create a copy of all of the volumes in the recovery plan with a prefix of the pod name and a suffix of "-puresra-testFailover".
The datastores will be mounted and resignatured with the prefix of "snap-XXXXXXXX-" in the name. This can be automatically in the SRM advanced settings discussed here Site Recovery Manager Advanced Options.
Verify the recovery plan result.
When done, click Cleanup in SRM.
The VMs and corresponding datastores will be removed and the volumes will be destroyed and eradicated on the FlashArray.
A recovery operation will attempt a vMotion if applicable of the virtual machines in the stretched protection group/recovery plan.
To verify a VM is eligible for vMotion, click on the SRM protection group and then the Virtual Machines tab. The VM will have Yes in the vMotion Eligible column.
For a given VM in the recovery plan, to disable or override a vMotion for a VM, click on the VM and choose Configure Recovery.
Deselect the Use vMotion for planned migration box if it is preferred for SRM to not attempt to vMotion that VM.
To start a recovery, click Run.
When the recovery wizard is initiated, you have one more option to disable vMotion for all VMs in the plan if you choose. The default is to attempt it for all eligible VMs.
If you choose to vMotion:
It is highly advisable to pre-connect the datastores on the target ESXi hosts prior to recovery or the recovery plan will hang until you connect and mount the datastores manually.
SRM will, if enabled, attempt to vMotion all applicable VMs. For all others, it will unregister them on the source vCenter, and then register and power them on within the target vCenter according to SRM resource mappings.
If the datastores are not presented prior to recovery and vMotion is enabled, the recovery plan will hang on the step Prepare stretched storage for VM migration at the protected site waiting for the storage as it will not be able to vMotion the VMs.
You can manually connect the datastores during the recovery and rescan the hosts to allow the process to progress. The ideal option though is to connect them prior to recovery.
If vMotion is not chosen:
If vMotion is deselected during the recovery, it is disabled for all VMs, or the environment does not support cross-vCenter vMotion, it is not required to have volumes preconnected to the target ESXi hosts. In this case, the SRA will connect them automatically during the Configure recovery site storage step.
If the datastores are not present and vMotion is enabled, the recovery plan will hang on the step Prepare stretched storage for VM migration at the protected site waiting for the storage as it will not be able to vMotion the VMs.
The datastores are not automatically attached until a later step in the recovery plan. If the datastores are not connected (and it is preferred to have the SRA connected them as needed), vMotion attempts must be disabled for the respective VMs, or the entire recovery plan. To ensure that vMotion is not attempted for an entire plan, deselect the option when starting the recovery.
Note that there is no operation on the FlashArray for the entire recovery. Verify the recovery.
Note that unlike an active/passive SRM recovery, the datastores remain mounted on the source vCenter.
The reprotect operation also does not interact with the storage-it just reverses the SRM protection group and recovery plan. To reprotect, click Reprotect in SRM.
Confirm the operation in the wizard and click Finish.
An SRM failback is identical in everyway to an original recovery for stretched storage. Ensure the proper resource mappings exist and the datastores are mounted in the target site to enable vMotion (if desired). Run the recovery plan normally to failback to the original vCenter.