Skip to main content
Pure Technical Services

Web Guide: Implementing vSphere Metro Storage Cluster With ActiveCluster

Currently viewing public documentation. Please login to access the full scope of documentation.

Executive Summary

Business critical applications and the virtual machines hosting them often need the highest possible resiliency to ensure that business operations do not stop in the case of a disaster—either localized or site-wide.

To ensure this, the data in use by those applications needs to be spread across two arrays, often in more than one geographic location and importantly be available in both sites, at the same time. To achieve this, some arrays offer synchronous replication that provide the ability to write to the same block storage volume simultaneously while maintaining write-order. This is traditionally called Active-Active replication.

The Pure Storage FlashArray introduced Active-Active replication in the Purity 5.0.0 release.

In VMware vSphere environments, a common use-case for Active-Active replication is with the VMware vSphere High Availability offering. Together, this solution is called VMware vSphere Metro Storage Cluster (vMSC). The combination of these features provider the best possible Recovery Time Objective (RTO) and Recovery Point Objective (RPO) for vSphere environments.

This paper overviews configuration and best practices for using the Pure Storage FlashArray ActiveCluster feature with the vSphere Metro Storage Cluster solution.

For specific best practices for individual features, please refer to Pure Storage or VMware documentation as the case may be.

Audience

This paper is intended for storage, VMware, network or other administrators who plan to implement Pure Storage FlashArray ActiveCluster with VMware vSphere Metro Storage Cluster.

As always, for specific questions or requests for assistance, please reach out to Pure Storage support at any time at support@purestorage.com.

Important Notes

Before setting up ActiveCluster and VMware vSphere Metro Storage Cluster it is important to read the ActiveCluster specific documentation and VMware’s own documentation:

  1. https://storagehub.vmware.com/#!/vsphere-storage/vmware-vsphere-r-metro-storage-cluster-recommended-practices
  2. https://support.purestorage.com/FlashArray/PurityFA/Protect/Replication/ActiveCluster_Requirements_and_Best_Practices
  3. https://support.purestorage.com/FlashArray/PurityFA/Protect/Replication/ActiveCluster_Solution_Overview

Ensure that the above stated best practices have been followed. This document assumes that the aforementioned documentation has been read prior to reading this document itself.

Furthermore, after January 2016 VMware retired the vMSC solution program and replaced it with the Partner Verified and Supported Products (PVSP) listing.

For reference, the VMware Compatibility Guide no longer updates vMSC listings for vSphere releases after this date (everything after 6.0 U1) and includes the following note:

Looking for Metro Storage Cluster (vMSC) solutions listed under PVSP?

vMSC was EOLed in late 2015. You can find more information about vMSC EOL in this KB article.

vMSC solution listing under PVSP can be found on our Partner Verified and Supported Products listing.

VMware KB on the program change: https://kb.vmware.com/s/article/52496

For official FlashArray ActiveCluster PVSP listing, refer to the following link:

https://www.vmware.com/resources/compatibility/vcl/partnersupport.php

As well as the VMware-hosted knowledge base article on ActiveCluster:

https://kb.vmware.com/s/article/51656

Solution Overview and Introduction

The most robust, resilient, and automated solution for critical data protection and availability combines three technologies:

  1. VMware vSphere High Availability—VMware’s vCenter feature for automated restart of virtual machines after service interruption to another VMware ESXi host.
  2. Pure Storage FlashArray ActiveCluster—a simple, built-in, active-active synchronous storage replication solution for FlashArray block storage.
  3. VMware vSphere Metro Storage Cluster—a VMware vCenter solution combining active-active storage and ESXi servers spread across geographic areas in a single vCenter cluster.

VMware vSphere High Availability

VMware High Availability (HA) is a technology that provides cluster-based monitoring of virtual machines running on included ESXi hosts. If a failure of the storage, network, or host occurs, the remaining ESXi hosts coordinate to restart the affected virtual machines on proper hosts. Through a variety of network and storage-based heartbeating—the ESXi hosts can detect and respond to a variety of failure or isolation events to provide the fastest automated recovery of virtual machines.

VMware HA offers a solution to protect applications running inside virtual machines that do not offer application-based high-availability or application cluster configurations. VMware HA, however, does not preclude the use of those features if the applications offer them. They can actually offer additional benefits on top of one another, though a detailed discussion on that topic is beyond the scope of this paper.

VMware HA is made possible via shared storage. Without shared storage, VMs and their data cannot be seen by surviving hosts and therefore a disaster restart operation is not an option. For this reason, it is a general best practice to provision storage identically and simultaneously to all hosts in a cluster—this provides the ability for any host to host any virtual machine on any other host in the cluster.

VMware vSphere Metro Storage Cluster

VMware vSphere Metro Storage Cluster (vMSC) is a feature that extends VMware HA with active-active stretched storage. VMware HA, as stated in the previous section, requires shared storage to be presented to all hosts in a cluster to enable restart of virtual machines in the event of a disaster.

In many scenarios, a cluster might have hosts in two entirely separate datacenters or geographical locations. For VMware HA to restart a VM on a host in the cluster that is in a different datacenter, that host must see that storage too. There are a few options to achieve this.

Scenario 1: Two-site host cluster, single-site storage

It is possible to cross-connect a storage array in one datacenter to hosts in its own datacenter and a second datacenter. Therefore, storage is only provided by one site. If the power goes out (for example) in datacenter A, no hosts will have access to the storage. Consequently, a one-site storage configuration does not really provide much additional resiliency.

ac1.png

Figure 1. Scenario 1: Stretched host cluster with non-stretched/single-site storage

Scenario 2: Two-site host cluster, single-site storage in a third site

Another option is to put an array in a third site and provide access to the hosts in two other sites to that storage. While there are many obvious problems with this one, the main one is that this does not protect against is the array failures or a network partition of site C. Either of these failures will cause all hosts to lose access.

ac2.png

Figure 2. Scenario 2: Stretched host cluster with non-stretched/single-site storage in third site

The third option is stretched storage. In this situation, a storage array is in site A and another array is in site B. Both sites have hosts that can see one or both arrays. One or more storage volumes are created on one array and an exact synchronous copy is created on the second array. All writes are synchronously mirrored between the arrays so the volume appears as the same in both sites and can be read from and written to simultaneously. If hosts in datacenter A fail (or the storage, or the entire datacenter), hosts in datacenter B can take over their workloads by using the array in datacenter B. This is the scenario that will be discussed in this paper.

ac3.png

Figure 3. Scenario 3: Stretched host cluster with stretched/dual-site storage

The combination of stretched storage AND stretched ESXi clusters provides an extremely high level of resiliency to an infrastructure. When the automatic failover and recovery features offered by VMware vSphere HA is enabled on top of these two topologies, RTO is reduced even further as VMware can quickly and intelligently respond and react to host, storage or site failures to bring virtual machines back online.

ActiveCluster

Pure Storage® Purity ActiveCluster is a fully symmetric active/active bidirectional replication solution that provides synchronous replication for RPO zero and automatic transparent failover for RTO zero. ActiveCluster spans multiple sites enabling clustered arrays and clustered hosts to be used to deploy flexible active/active datacenter configurations.

ac4.png

Synchronous Replication - Writes are synchronized between arrays and protected in non-volatile RAM (NVRAM) on both arrays before being acknowledged to the host.

Symmetric Active/Active - Read and write to the same volumes at either side of the mirror, with optional host-to-array site awareness.

Transparent Failover - Automatic Non-disruptive failover between synchronously replicating arrays and sites with automatic resynchronization and recovery.

Async Replication Integration - Uses async for baseline copies and resynchronizing. Convert async relationships to sync without resending data. Async data to a 3rd site for DR.

No Bolt-ons & No Licenses - No additional hardware required, no costly software licenses required, just upgrade the Purity Operating Environment and go active/active!

Simple Management - Perform data management operations from either side of the mirror, provision storage, connect hosts, create snapshots, create clones.

Integrated Pure1® Cloud Mediator - Automatically configured passive mediator that allows transparent failover and prevents split-brain, without the need to deploy and manage another component.

Components

Purity ActiveCluster is composed of three core components: The Pure1 Mediator, active/active clustered array pairs, and stretched storage containers.

ac5.png

The Pure1 Cloud Mediator - A required component of the solution that is used to determine which array will continue data services should an outage occur in the environment.

Active/Active Clustered FlashArrays - Utilize synchronous replication to maintain a copy of data on each array and present those as one consistent copy to hosts that are attached to either, or both, arrays.

Stretched Storage Containers - Management containers that collect storage objects such as volumes into groups that are stretched between two arrays.

Administration

ActiveCluster introduces a new management object: Pods. A pod is a stretched storage container that defines a set of objects that are synchronously replicated together and which arrays they are replicated between. An array can support multiple pods. Pods can exist on just one array or on two arrays simultaneously with synchronous replication. Pods that are synchronously replicated between two arrays are said to be stretched between arrays.

ac6.png

Pods can contain volumes, protection groups (for snapshot scheduling and asynchronous replication) and other configuration information such as which volumes are connected to which hosts. The pod acts as a consistency group, ensuring that multiple volumes within the same pod remain write order consistent.

Pods also provide volume namespaces, that is different volumes may have the same volume name if they are in different pods. In the image above the volumes in pod3 and pod 4 are different volumes than those in pod1, a stretched active/active pod. This allows migration of workloads between arrays or consolidation of workloads from multiple arrays to one, without volume name conflicts.

Mediator

Transparent failover between arrays in ActiveCluster is automatic and requires no intervention from the storage administrator. Failover occurs within standard host I/O timeouts similar to the way failover occurs between two controllers in one array during non-disruptive hardware or software upgrades.

ActiveCluster is designed to provide maximum availability across symmetric active/active storage arrays while preventing a split-brain condition from occurring. Split brain being the case where two arrays might serve I/O to the same volume, without keeping the data in sync between the two arrays.

Any active/active synchronous replication solution designed to provide continuous availability across two different sites requires a component referred to as a witness or voter to mediate failovers while preventing split brain. ActiveCluster includes a simple to use, lightweight, and automatic way for applications to transparently failover, or simply move, between sites in the event of a failure without user intervention: The Pure1 Cloud Mediator.

The Pure1 Cloud Mediator is responsible for ensuring that only one array is allowed to stay active for each pod when there is a loss of communication between the arrays.

In the event that the arrays can no longer communicate with each other over the replication interconnect, both arrays will pause I/O and reach out to the mediator to determine which array can stay active for each sync replicated pod. The first array to reach the mediator is allowed to keep its synchronously replicated pods online. The second array to reach the mediator must stop servicing I/O to its synchronously replicated volumes, in order to prevent split brain. The entire operation occurs within standard host I/O timeouts to ensure that applications experience no more than a pause and resume of I/O.

The Pure1® Cloud Mediator

A failover mediator must be located in a 3rd site that is in a separate failure domain from either site where the arrays are located. Each array site must have independent network connectivity to the mediator such that a single network outage does not prevent both arrays from accessing the mediator. A mediator should also provide a very lightweight and easy to administer component of the solution. The Pure Storage solution provides this automatically by utilizing an integrated cloud based mediator. The Pure1 Cloud Mediator provides two main functions:

  • Prevent a split brain condition from occurring where both arrays are independently allowing access to data without synchronization between arrays.
  • Determine which array will continue to service IO to synchronously replicated volumes in the event of an array failure, replication link outage, or site outage.

The Pure1 Cloud Mediator has the following advantages over a typical non-Pure heavy handed voter or witness component:

  • SaaS operational benefits - As with any SaaS solution the operational maintenance complexity is removed: nothing to install onsite, no hardware or software to maintain, nothing to configure and support for HA, no security patch updates, etc.
  • Automatically a 3rd site - The Pure1 Cloud Mediator is inherently in a separate failure domain from either of the two arrays.
  • Automatic configuration - Arrays configured for synchronous replication will automatically connect to and use the Pure1 Cloud Mediator.
  • No misconfiguration - With automatic and default configuration there is no risk that the mediator could be incorrectly configured.
  • No human intervention - A significant number of issues in non-Pure active/active synchronous replication solutions, particularly those related to accidental split brain, are related to human error. Pure’s automated non-human mediator eliminates operator error from the equation.
  • Passive mediation - Continuous access to the mediator is not required for normal operations. The arrays will maintain a heartbeat with the mediator, however if the arrays lose connection to the mediator they will continue to synchronously replicate and serve data as long as the replication link is active.

On-Premises Failove Mediator

Failover mediation for ActiveCluster can also be provided using an on-premises mediator distributed as an OVF file and deployed as a VM. Failover behaviors are exactly the same as described above. The on-premises mediator simply replaces the role of the Pure1 Cloud Mediator during failover events.

The on-premises mediator has the following basic requirements:

  • The on-premises mediator can only be deployed as a VM on virtualized hardware. It is not installable as a stand-alone application.
  • High Availability for the mediator must be provided by the hosts on which the mediator is deployed. For example, using VMware HA, or Microsoft Hyper-V HA Clustering.
  • Storage for the mediator must not allow the configuration of the mediator to be rolled back to previous versions. This applies to situations such as storage snapshot restores, or cases where the mediator might be stored on mirrored storage.
  • The arrays must be configured to use the on-premises mediator rather than the Pure1 Cloud Mediator.
  • The mediator must be deployed in a third site, in a separate failure domain that will not be affected by any failures in either of the sites where the arrays are installed.
  • Both array sites must have independent network connections to the mediator such that a failure of one network connection does not prevent both arrays from accessing the mediator.

Uniform and Non-Uniform Access

Hosts can be configured to see just the array that is local to it, or to both arrays. The former option is called non-uniform vMSC, the latter is referred to as uniform vMSC.

A uniform storage access model can be used in environments where there is host-to-array connectivity of either FC or ethernet (for iSCSI), and array-to-array ethernet connectivity, between the two sites. When deployed in this way a host has access to the same volume through both the local array and the remote array. The solution supports connecting arrays with up to 5ms of round trip time (RTT) latency between the arrays.

ac7.png

The image above represents the logical paths that exist between the hosts and arrays, and the replication connection between the two arrays in a uniform access model. Because a uniform storage access model allows all hosts, regardless of site location, to access both arrays there will be paths with different latency characteristics. Paths from hosts to the local array will have lower latency; paths from each local host to the remote array will have higher latency.

For the best performance in an active/active synchronous replication environments hosts should be prevented from using paths that access the remote array unless necessary. For example in the image below if VM 2A were to perform a write to volume A over the host side connection to array A, that write would incur 2X the latency of the inter-site link, 1X for each traverse of the network. The write would experience 5ms of latency for the trip from host B to array A and experience another 5ms of latency while array A synchronously sends the write back to array B.

ac8.png

With Pure Storage Purity ActiveCluster there are no such management headaches. ActiveCluster does make use of ALUA to expose paths to local hosts as active/optimized paths and expose paths to remote hosts as active/non-optimized. However, there are two advantages in the ActiveCluster implementation.

  • In ActiveCluster volumes in stretched pods are read/write on both arrays. There is no such thing as a passive volume that cannot service both reads and writes.
  • The optimized path is defined on a per host-to-volume connection basis using a preferred-array option; this ensures that regardless of what host a VM or application is running on it will have a local optimized path to that volume.

ac9.png

ActiveCluster enables truly active/active datacenters and removes the concern around what site or host a VM runs on; the VM will always have the same performance regardless of site. While a VM 1A is running on host A accessing volume A it will use only the local optimized paths as shown in the next image.
ac10.png

If the VM or application is switched to a host in the other site, with the data left in place, only local paths will be used in the other site as shown in the next image. There is no need to adjust path priorities or migrate the data to a different volume to ensure local optimized access.

Non-Uniform Access

A non-uniform storage access model is used in environments where there is host-to-array connectivity of either FC or ethernet (for iSCSI) only locally within the same site. Ethernet connectivity for the array-to-array replication interconnect must still exist between the two sites. When deployed in this way each host has access to a volume only through the local array and not the remote array. The solution supports connecting arrays with up to 5ms of round trip time (RTT) latency between the arrays.

ac11.png

Hosts will distribute I/Os across all paths to the storage only, because only the local Active/Optimized paths are available.

Configuring ActiveCluster

A major benefit of using an ActiveCluster stretched storage solution is how simple it is to setup.

Before moving forward, ensure environment configuration requirements are followed as dictated in this KB article:

https://support.purestorage.com/FlashArray/PurityFA/Protect/Replication/ActiveCluster_Requirements_and_Best_Practices

ActiveCluster Glossary of Terms

The following terms will be used repeatedly in this document that have been introduced for ActiveCluster:

  • Pod—a pod is a namespace and a consistency group. Synchronous replication can be activated on a pod, which makes all volumes in that pod present on both FlashArrays in the pod.
  • Stretching—stretching a pod is the act of adding a second FlashArray to a pod. When stretched to another array, the volume data will begin to synchronize, and when complete all volumes in the pod will be available on both FlashArrays
  • Unstretching—unstretching a pod is the act of removing a FlashArray from a pod. This can be done from either FlashArray. When removed, the volumes and the pod itself, are no longer available on the FlashArray that was removed.
  • Restretching— When a pod is unstretched, the other array (the array unstretched from) will keep a copy of the pod in the trash can for 24 hours, this would allow the pod to be quickly re-stretched without having to resend all the data if restretched prior to 24 hours.

Creating a Synchronous Connection

The first step to enable ActiveCluster is to create a synchronous connection with another FlashArray. It does not matter which FlashArray that is used to create the connection—either one is fine.

Login to the FlashArray Web Interface and click on the Storage section. Click either on the plus sign or on the vertical ellipsis and choose Connect Array.

ac12.png

The window that pops-up requires three pieces of information:

  1. Management address—this is the virtual IP address or FQDN of the remote FlashArray.
  2. Connection type—choose Sync Replication for ActiveCluster.
  3. Connection Key—this is an API token that can be retrieved from the remote FlashArray.

To obtain the connection key, login to the remote FlashArray Web Interface and click on the Storage section and then click on the vertical ellipsis and choose Get Connection Key.

ac13.png

Copy the key to the clipboard using the Copy button.

ac14.png

Go back to the original FlashArray Web Interface and paste in the key.

ac15.png

The replication address field may be left blank and Purity will automatically discover all the replication port addresses. If the target addresses are via Network Address Translation (NAT) then it is necessary to enter the replication port NAT addresses. 

When complete, click Connect.

If everything is valid, the connection will appear in the Connected Arrays panel.

ac16.png

If the connection fails, verify network connectivity and IP information and/or contact Pure Storage Support.

Creating a Pod

The next step to enable ActiveCluster is to create a consistency group. With ActiveCluster, this is a called a “pod”.

A pod is both a consistency group and a namespace—in effect creating a grouping for related objects involved in ActiveCluster replication. One or more pods can be created. Pods are stretched, unstretched, and failed over together.

Therefore, the basic idea for one or more pods is simply to put related volumes in the same pod. If they host related applications that should remain in the same datacenter together or have consistency with one another, put them in the same pod. Otherwise, put them in the same pod for simplicity, or different ones if they have different requirements. See the following KB for FlashArray object limits for additional guidance:

https://support.purestorage.com/FlashArray/PurityFA/General_Troubleshooting/Pure_Storage_FlashArray_Limits

To create a pod, login to the FlashArray GUI and click on Storage, then Pods, then click on the plus sign.

ac17.png

Next, enter a name for the pod, this can be letters, numbers or dashes (must start with a letter or number). Then click Create.

ac18.png

The pod will then appear in the Pods panel.

ac19.png

To further configure the pod, click on the pod name.

ac20.png    

The default configuration for ActiveCluster is to use the Cloud Mediator—no configuration is required other than ensure the management network from the FlashArray is redundant (uses two NICs per controller) and have IP access to the mediator. Refer to the networking section in the below KB for more details:

https://support.purestorage.com/FlashArray/PurityFA/Protect/Replication/ActiveCluster_Requirements_and_Best_Practices

The mediator in use can be seen in the overview panel under the Mediator heading. If the mediator is listed as “purestorage” the Cloud Mediator is in use.

For sites that are unable to contact the Cloud Mediator, the ActiveCluster On-Premises Mediator is available. For deployment instructions, refer to the following article:

https://support.purestorage.com/FlashArray/PurityFA/Protect/Replication/ActiveCluster_On-Premises_Mediator_Deployment_Guide

ac21.png

Pod configuration is complete.

Adding Volumes to a Pod

The next step is to add any pre-existing volumes to the pod. Once a pod has been enabled for replication, pre-existing volumes cannot be moved into the pod, only new volumes can be created in the pod.

To add a volume to a pod, go to the Storage screen in the FlashArray Web Interface and then click on Volumes and then click on the name of a volume to be added to the pod. To find the volume quickly, it can be found by searching for its name.

ac22.png

When the volume screen loads, click on the vertical ellipsis in the upper right-hand corner and choose Move.

ac23.png

To choose the pod, click on the Container box and choose the pod name.

ac24.png

Note that as of Purity 5.0.0, the following limitations exist with respect to moving a volume into or out of a pod:

  • Volumes cannot be moved directly between pods. A volume must first be moved out of a pod, then moved into the other pod.
  • A volume in a volume group cannot be added into a pod. It must be removed from the volume group first.
  • A volume cannot be moved out of an already stretched pod. The pod must first be unstretched, then the volume can be moved out.
  • A volume cannot be moved into an already stretched pod. It must first be unstretched and then an existing volume into it can be added. At this point the pod can then be restretched.

Some of these restrictions may relax in future Purity versions.

Choose the valid target pod and click Move.

ac25.png

This will move the volume into the pod and rename the volume to have prefix consisting of the pod name and two colons. The volume name will then be in the format of <podname>::<volumename>.

ac26.png

The pod will list the volume under its Volumes panel.

ac27.png

Creating a New Volume in a Pod

Users can also create new volumes directly in a pod. Click on Storage, then Pods, then choose the pod. Under the Volumes panel, click the plus sign to create a new volume.

ac28.png

In the creation window, enter a valid name and a size and click Create. This can be done whether or not the pod is actively replicating.

ac29.png

Stretching a Pod

The next step is adding a FlashArray target to the pod. This is called “stretching” the pod, because it automatically makes the pod and its content available on the second array.

Please note that once a pod has been “stretched”, pre-existing volumes cannot be added to it until it is “un-stretched”. Otherwise, once a pod has been stretched, only new volumes can be created in the pod. Therefore, if it is necessary to add existing volumes to a pod, follow the instructions in the section Adding Volumes to a Pod on page 17, then stretch the pod.

To stretch a pod, add a second array to the pod. Do this by clicking on the plus sign in the Arrays panel. 

ac30.png

Choose a target FlashArray and click Add.

ac31.png

The arrays will immediately start synchronizing data between the two FlashArrays.

ac32.png

Active/Active storage is not available until the synchronization completes which will be shown when the resyncing status ends and both arrays are online.

ac33.png

On the remote FlashArray the pod and volumes will now exist in identical fashion and will be available for provisioning to a host or hosts on either FlashArray.

ac34.png

Un-Stretching a Pod

Once ActiveCluster has been enabled, replication can also be terminated by unstretching it. This might be done for a variety of reasons, such as to change the pod volume membership, or maybe the replication was temporarily enabled to migrate the volumes from one array to another.

The act of terminating replication is called un-stretching. To un-stretch a pod, remove the array which no longer needs to host the volumes. For example, take the below pod:

ac35.png

The pod has two arrays; sn1-x70-b05-33 and sn1-x70-c05-33. Since this pod is online, both arrays offer up the volumes in the pod. If want the volumes to only remain on sn1-x70-b05-33, I would then remove the other FlashArray, sn1-x70-b05-33.

Before removing a FlashArray from a pod ensure that the volumes in the pod are disconnected from any host or host group on the FlashArray to be removed from the pod. Purity will not allow a pod to be unstretched if the FlashArray chosen for removal has existing host connections to volumes in the pod.

To remove a FlashArray, choose the appropriate pod and inside of the Arrays panel, click on the trash icon next to the array to be removed.

ac36.png

When it has been confirmed that it is the proper array to remove, click the Remove button to confirm the removal.

ac37.png

On the FlashArray that was removed, under the Pods tab, the pod will be listed in the Destroyed Pods panel.

ac38.png

If the un-stretch was done in error, go back to the FlashArray Web Interface that remains in the pod and add the other FlashArray back. This will remove the pod from the Destroyed Pods status back to active.

The pod can be instant re-stretched for 24 hours. At 24 hours, the removed FlashArray will permanently remove its references to the pod.  Permanent removal can be forced early by selecting the destroyed pod and clicking the trash icon next to it.

ac39.png

Click Eradicate to confirm the removal.

ac40.png

Configuring ESXi Hosts

Configuration of the ESXi hosts is not different than the configuration for non-ActiveCluster FlashArrays, so all best practices described Still apply.

With that in mind, there are still a few things worth mentioning in this document.

Host Connectivity

A FlashArray host object is a collection of a host’s initiators that can be “connected”  to a volume. This allows those specified initiators (and therefore that host) to access that volume or volumes.

Create a host object on the FlashArray by going to the Storage section and then the Hosts tab.

ac41.png

Hosts

Click on the plus sign in the Hosts panel to create a new host. Assign the host a name that makes sense and click Create.

ac42.png

Click on the newly created host and then in the Host Ports panel, click the vertical ellipsis and choose either Configure WWNs (for Fibre Channel) or Configure IQNs (for iSCSI).

ac43.png

For WWNs, if the initiator is presented on the fabric to the FlashArray (meaning zoning is complete), click in the correct WWN in the left pane to add it to the host, or alternatively click the plus sign and type it in manually. iSCSI IQNs must be typed in manually always.

ac44.png

When all the initiators are added/selected, click Add.

BEST PRACTICE: All Fibre Channel hosts should have at least two initiators for redundancy. ESXi iSCSI usually only has one initiator (IQN) but should have two or more physical NICs in the host that can talk to the FlashArray iSCSI targets.

Verify connectivity by navigating to the Health section and then the Connections tab. Find the newly created host and look at the Paths column. If it lists anything besides Redundant, investigate the reported status. 

ac45.png

Host Groups

To make it easier to provision storage to a cluster of hosts, it is recommended to put all the FlashArray host objects into a host group.

To create a host group, click on the Storage section followed by the Hosts tab. In the Host Group panel, click the plus sign.

ac46.png

Enter a name for the host group and click Create.

ac47.png

Now click on the host group in the Host Groups panel and then click on the vertical ellipsis in the Member Hosts panel and choose Add.

ac48.png

Select one or more hosts in the following screen to add to the host group.

ac49.png

If the environment is configured for uniform access, all hosts in the ESXi cluster should be configured on both FlashArrays and added to their host group. If the configuration is non-uniform, only the hosts that have direct access to the given FlashArray need to be added to that FlashArray and its corresponding host group.

Multipathing

Standard ESXi multipathing recommendations apply which are described in more detail in the VMware Best Practices guide.

Recommendations at a high level include the following:

  • Use the VMware Round Robin path selection policy for FlashArray storage with the I/O Operations Limit set to 1.
    • In ESXi 6.0 Express Patch 5 and ESXi 6.5 Update 1 and later this is a default setting in ESXi for FlashArray storage and therefore no manual configuration is required in those releases.

  • Use multiple HBAs per host for Fibre Channel or multiple NICs per host for iSCSI.
  • It is recommended to use Port Binding for Software iSCSI when possible.
  • Connect each host to both controllers.
  • In the storage or network fabric, use redundant switches.
Uniform Configuration

In a uniform configuration, all hosts have access to both FlashArrays and can therefore see paths for a ActiveCluster-enabled volume to each FlashArray. 

To start, I will create a new VMFS volume on my FlashArray. To expedite the process, it is advisable to use the vSphere Web Client plugin, but for the purposes of explanation I will walk through the process using the FlashArray Web Interface and the vSphere Web Client.

In this environment, I have an eight-node ESXi cluster—each host is zoned to both FlashArrays.

ac50.png

The vCenter cluster:

ac51.png

The corresponding host group on the first FlashArray:

ac52.png

And on the second FlashArray: 

The first step is to create a volume on my FlashArray in site A and add it to my pod “vMSC-pod01”.

ac53.png

The pod is also not yet stretched to the FlashArray in site B.

asc54.png

The next step is to add it to my host group on the FlashArray in site A.

ac55.png

ac56.png

My volume is now in the host group “Uniform” and is in an un-stretched pod on the FlashArray in site A.

ac57.png

The next step is to rescan the vCenter cluster.

ac58.png

Once the rescan is complete, click on one of the ESXi hosts and then go to the Configure tab, then Storage Devices and select the new volume that was provisioned and look at the Paths tab.

ac59.png

I currently have 12 paths to the new volume on the FlashArray in site A. All of them are active for I/O.

The next step is to stretch the pod to the FlashArray in site B.

ac60.png

As soon as the FlashArray is added, the pod will start synchronizing and when it is complete the pod will go fully online and the volume will be available on both FlashArrays.

ac61.png

Now to have the hosts see it on the second FlashArray, add the volume to the proper host or host group on that FlashArray as well.

ac62.png

Now to see the additional paths, rescan the ESXi cluster.

ac63.png

Once the rescan completes, the new paths to the volume via the second FlashArray will appear.

ac64.png

ESXi supports up to 32 paths per volume, so do not provision more paths than that. If the per-volume count exceeds 32, unpredictable paths will be dropped, possibly causing uneven access to arrays.

Preferred Paths

The default behavior is that all paths from a FlashArray to a host will be actively used by ESXi—even ones from the secondary FlashArray. When replication occurs over extended distances, this is generally not ideal. In situations where the sites are far apart, two performance-impacting things occur:

  • Half of the writes (assuming both FlashArrays offer an equal amount of paths for each device) sent from a host in site A will be sent to the FlashArray in site B. Since writes must be acknowledged in both sites, this means the data traverses the WAN twice. First the host issues a write across the WAN to the far FlashArray, and then the far FlashArray forwards it back across the WAN to the other FlashArray. This adds unnecessary latency. The optimal path is for the host to send writes to the local FlashArray and then the FlashArray forwards it to the remote FlashArray. In the optimal situation, the write must only traverse the WAN once.
  • Half of the reads (assuming both FlashArrays offer an equal amount of paths for each device) sent from a host in site A will be sent to the FlashArray in site B. Reads can be serviced by either side, and for reads there is no need for one FlashArray to talk to the other. So a read need not ever traverse the WAN in normal circumstances. Servicing all reads from the local array to a given host is the best option for performance.

The FlashArray offers an option to intelligently tell ESXi which FlashArray should optimally service I/O in the event a ESXi host can see paths to both FlashArrays for a given device. This is a FlashArray host object setting called Preferred Arrays.

In a situation where the FlashArray are in geographically different datacenters it is important to set the preferred array for a host on BOTH FlashArrays.

For each host, login to the FlashArray Web Interface for the array that is local to that host. Click on the Storage section, then the Hosts tab, then choose the host to be configured. Then in the Details panel, click on the Add Preferred Arrays option.

BEST PRACTICE: For every host that has access to both FlashArrays that host an ActiveCluster volume, set the preferred FlashArray for that host on both FlashArrays. Tell FlashArray A that it is preferred for host A. Tell FlashArray B that FlashArray A is preferred for host A. Doing this on both FlashArrays allows a host to automatically know which paths are optimized and which are not.

ac65.png

Choose that FlashArray as preferred for that ESXi host and click Add.

ac66.png

If that same host exists on the remote FlashArray, login to the remote FlashArray Web Interface. Click on the Storage section, then the Hosts tab, then choose the host to be configured. Then in the Details panel, click on the Add Preferred Arrays option.

ac67.png

Choose the earlier FlashArray as preferred for that ESXi host and click Add.

ac68.png

It can then be seen in vSphere that half of the paths will be now marked as Active and the other will be marked as Active (I/O). The Active (I/O) paths are the paths which are used for VM I/O. The other paths will only be used if the paths to the preferred FlashArray go away.

ac69.png

When preferred array has been turned off/on or changed, the FlashArray issues 6h/2a/6h (Sense code/ASC/ASCQ) which translates to UNIT ATTENTION ASYMMETRIC ACCESS STATE CHANGED to the host to inform it of the path state change proactively.

Non-Uniform Configuration

In a non-uniform configuration, hosts only have storage access to the FlashArray local to them. Therefore, in the case of a SAN or storage failure, the hosts local to that array will lose all connectivity to the storage. 

To start, I will create a new VMFS volume on my FlashArray. To expedite the process, it is advisable to use the vSphere Web Client plugin, but for the purposes of explanation I will walk through the process using the FlashArray Web Interface and the vSphere Web Client.

In this environment, I have an eight-node ESXi cluster—4 are zoned to FlashArray 1 and the other four are zoned to FlashArray 2.

ac70.png

The vCenter cluster:

ac71.png

In a non-Uniform environment, only the hosts local to a FlashArray have storage connectivity to it. So on FlashArray 1, the host group only includes four hosts:

ac72.png

And on the second FlashArray, the other four hosts:

ac73.png

The first step is to create a volume on my FlashArray in site A and add it to my pod “vMSC-pod01”.

ac74.png

The pod is also not yet stretched to the FlashArray in site B yet.

ac75.png

The next step is to add it to my host group on the FlashArray in site A.

ac76.png

ac77.png

My volume is now in the host group “Non-Uniform” and is in an un-stretched pod on the FlashArray in site A.

ac78.png

The next step is to rescan the vCenter cluster.

ac79.png

Once the rescan is complete, click on one of the ESXi hosts that is has access to the FlashArray that currently hosts the volume and then go to the Configure tab, then Storage Devices and select the new volume that was provisioned and look at the Paths tab.

ac80.png

I currently have 12 paths to the new volume on the FlashArray in site A. All of them are active for I/O. Four hosts will have access and four will not.

The next step is to stretch the pod to the FlashArray in site B.

ac81.png

As soon as the FlashArray is added, the pod will start synchronizing and when it is complete the pod will go fully online and the volume will be available on both FlashArrays.

ac82.png

Now to have the other four hosts see it on the second FlashArray, add the volume to the proper host or host group on that FlashArray as well.

ac83.png

Now to see the additional paths, rescan the ESXi cluster.

ac84.png

Once the rescan completes, the other four hosts will now have paths to the volume via the second FlashArray.

ac85.png

The original four hosts will have access to the volume via paths on the first FlashArray.

ac86.png

vSphere HA

When enabling vSphere HA, it is important to follow VMware best practices and recommendations. Pure Storage recommendations do not differ from standard VMware requirements which can be found at the following links:

Pure Storage does not support vSphere versions prior to 5.5 with ActiveCluster.

BEST PRACTICE: Pure Storage recommends (but does not require) a stretched layer 2 network in stretched cluster environments. This allows VMs to be moved to or failed over to hosts that are in a different network with reconfiguring their network information.

To enable automatic failover of virtual machines in the case of a failure, ensure that vSphere HA is turned on. To do this, go to the Host & Clusters view and click on the desired cluster in the inventory pane. Then click on the Configure tab and Edit. Select Turn on vSphere HA.

ac87.png

BEST PRACTICE: Pure Storage recommends enabling vSphere HA on vCenter clusters. 

Proactive HA is a feature introduced in vSphere 6.5 that integrates with server hardware monitoring services that can detect and inform ESXi of specific failures, like fans, memory, or power supplies. This feature has no direct connection with ActiveCluster or storage failures (they are monitored by standard vSphere HA settings) and therefore Pure Storage has no specific recommendation on the enabling/disabling or configuration of Proactive HA.

For standard vSphere HA, there are other settings that should be verified and set which are described in the following sub-sections.

Host Failure Response

In the case of a failure of a host, surviving ESXi hosts take over and reboot the affected virtual machines on themselves. A failure could include host power loss, kernel crash or hardware failure.

ac88.png

BEST PRACTICE: Pure Storage recommends leaving host monitoring enabled.

Host monitoring is enabled by default when vSphere HA is enabled. Pure Storage recommends leaving this enabled and has no specific recommendations for its advanced settings (see VMware documentation here). 

Datastore with PDL

For environments running ESXi 6.0 or later, the ability to respond to Permanent Device Loss (PDL) was added into vSphere HA.

PDL occurs when a storage volume is no longer accessible to an ESXi host—furthermore, communication between the host and array has not stopped and SCSI interaction can continue (exchange of SCSI operations and response code) but that specific volume is no longer available to that host. The array will inform ESXi that the volume is no longer accessible using specific SCSI sense codes and ESXi then will stop sending I/O requests to that volume.

More information on PDL can be found in the following VMware KB:

https://kb.vmware.com/s/article/2004684

The default behavior of PDL response is for vSphere HA to not restart VMs on other hosts when PDL is encountered. Especially in non-Uniform configurations, Pure Storage recommends enabling this setting by choosing Power off and restart VMs.

ac89.png

In Uniform configurations, enabling this setting is less important since the volume is presented via two arrays, and full PDL would require the volume to be removed from both FlashArray. While, the accidental removal of access of the volume to a host via both arrays is less likely to occur, it is still possible and therefore Pure Storage recommends enabling Power off and restart VMs.

BEST PRACTICE: Pure Storage recommends setting PDL response to Power off and restart VMs.

Datastore with APD

In many failure scenarios, ESXi cannot determine if it has lost access to a volume permanently (e.g. volume removed) or temporarily (e.g. network outage). In the case where there is a failure that ESXi cannot communicate to the underlying array, the array cannot send the appropriate SCSI sense codes and ESXi is unable to determine the type of loss. This state is referred to as All Paths Down (APD).

If a failure occurs, there is a timeout of 140 seconds and then ESXi considers the volume APD. Once this occurs all non-virtual machine I/O to the storage volume is terminated, but virtual machine I/O is still indefinitely retried. ESXi 6.0 introduced the ability for vSphere HA to respond to this situation.

The APD timeout of 140 seconds is controlled by an advanced ESXi setting called Misc.APDTimeout. Pure Storage recommends leaving this at the default and should generally only be changed under the guidance of VMware or Pure Storage support. Reducing this value can lead to false positives of APD occurrences.

APD response options:

  • All Paths Down (APD) Failure Response: by default, vSphere HA does nothing in the face of APD. Pure Storage recommends setting this option to either Power off and restart VMs – Conservative or Power off and restart VMs – Aggressive. The conservative option will only power-off VMs if it is sure that they can be restarted elsewhere. The aggressive option will shut them down regardless and best effort try to restart them elsewhere. Pure Storage has no specific recommendation for either and it is up to the customer to decide. Pure Storage does recommend choosing one of the two Power off options and not leaving it Disabled or set to Issue Events.
  • Response recovery: by default, if an APD situation recovers before a power-off has occurred, ESXi will do nothing. In some cases, it might be preferable to have the VMs restarted after a prolonged APD occurrence as some applications and operating systems do not respond well to a lengthy storage outage. In this case, vSphere HA can react to the temporary loss and subsequent recovery of storage by resetting the affected virtual machines. Pure Storage has no specific recommendations on this setting.
  • Response delay: by default, vSphere HA will not immediately begin powering-off VMs when their storage has reached an APD state. Once the 140 second (or whatever it has been set to) has been reached, vSphere HA will wait 3 minutes to power-off affected virtual machines. This wait period can be reduced or increased as needed. Pure Storage does not have specific recommendations on this value. If the value is decreased, failovers after APD responses will happen more quickly, but could also lead to unnecessary failovers (the storage comes back to the original host quicker than the VMs can be restarted on another host).

In order for VMs to be restarted by VMware HA in the event of an ActiveCluster failover, it is required to set the vSphere HA APD response to Power off and restart VMs

VM Monitoring

vSphere HA also offers the ability to detect a guest operating system crash when VM tools is installed in the guest and attempt a restart. This has no direct relevance to ActiveCluster and therefore Pure Storage has no specific recommendations for enabling/disabling and configuring this feature.

ac90.png

Heartbeat Datastores

Prior to vSphere 5.0, if a host management network was down, but the host and VMs were running fine (and even the VM network itself was fine), vSphere HA might shutdown the VMs on that host and boot them up elsewhere, even though there was no need. This would cause unnecessary downtime of the affected VMs. This occurred because the other hosts could not detect if the host experience a true failure or simply could not communicate over the network. In order to detect the difference between a host failure and network isolation, VMware introduced Datastore Heartbeating. With datastore heartbeating, each host constantly updates a heartbeat region on a shared datastore. If network communication is lost to a host, the heartbeat region of one or more heartbeat datastores is checked. If the host that lost network communication is still updating its heartbeat region, it is considered to still be “alive” but isolated.

Responding to network isolation is discussed in the next section.

VMware offers three settings for heartbeat datastores:

  • Automatically select datastores accessible from the host.
  • Use datastores only from the specified list.
  • Use datastores only from the specified list and complement automatically if needed.

ac91.png

Pure Storage does not have strict recommendations on these settings other than that automatic selection or specify/complement are the two preferred options. Use only “Use datastores only from the specified list” if there are datastore that are not viable for heart-beating (low reliability for instance).

Response for Host Isolation

The default behavior of vSphere HA is to not automatically restart VMs if a host has been isolated. Host isolation means that the management network is no longer receiving heartbeats from other participants in the cluster. Once this occurs, the ESXi will ping the configured isolation address (by default the gateway address). If it cannot reach the gateway, the host considers itself isolated.

ac92.png

Pure Storage maintains no official recommendations for host isolation settings as this can be very environment specific. Though some considerations should be evaluated—an example of some important ones are below:

  • If isolation response is enabled, and the storage is presented over the TCP/IP network (iSCSI) it is likely that a network partition of the host will also affect iSCSI access. So power-off and restart is likely the desired option since the VM will have lost its storage and a graceful power-off is not possible.
  • If isolation response is enabled, and the storage is presented over Fibre Channel, shut down and restart VMs is likely the preferred option, as Fibre Channel access will likely persist in the event of TCP/IP network loss and a graceful shutdown will be the friendlier option.
  • If the management network and the VM networks are physically the same equipment, it is likely that if one goes down, so do both. In this case, VMs often should be set to be restarted, especially if they need access to the network to run properly.
  • If the VMs either do not need network access to run properly or the management network is physically separate from the VM network, enabling VM restart may not be the best idea as it will just introduce unnecessary downtime. In this case, it might just be a better idea to configure vCenter alerts for host isolation and fix the management network issue and let the VMs continue to run. For configuring vCenter alerts, refer to VMware documentation.

Virtual Machine Overrides

vSphere HA also provides the ability to configure all the above settings on a per-VM basis. If certain virtual machines need to be recovered in a special way, or more forceful, the cluster-wide settings can be overridden for specific virtual machines as needed.

Setting these overrides is optional and is environment specific—Pure Storage has no specific recommendations concerning overrides.

To configure VM overrides, click on the cluster in the vCenter inventory, then the Configure tab then the VM Overrides section in the side panel that appears.

ac93.png

Click Add to create an override.

ac94.png

Many of these settings are described in the previous sub-sections. But there are also some new specific settings (some of the features are unique to vSphere 6.5 and later):

  • Automation level—this is enabled if vSphere DRS is turned on and can override the cluster DRS automation level for this VM
  • VM restart priority—the important of the virtual machine to be restarted. The higher the setting the more priority given to that VM to be restarted.
  • Start next priority VMs when—dictates what vSphere should wait to see to view a restart priority group as fully rebooted and ready.
  • Additional delay—how long should vSphere wait after the previous priority group completes before starting on the next one.
  • Or after timeout occurs at—specify how long it should wait if the event it is waiting for to being starting the next priority group to respond.

VM and Host Groups and Affinity Rules

When it comes to the logistics of a host failure, there is nothing special about the virtual machine recovery process with vSphere HA when combined with ActiveCluster. vSphere HA finds a host that has access to the storage of the failed virtual machines and restarts it. That process is no different.

With that being said, it might be desirable to help vSphere HA choose what host (or hosts) to restart recovered VMs on. Since ActiveCluster can present the same storage in two separate datacenters and vMSC allows for hosts in separate datacenters to be in the same cluster, a recovered virtual machine could be on a host in datacenter A that then fails and be restarted on a host in datacenter B. Even though there are healthy any available hosts in datacenter A still left in that cluster.

There is nothing intrinsically wrong with that, but due to application relationships it might be preferential to keep certain VMs in the same physical datacenter for performance, network, or even business compliance reasons.

vSphere HA offers a useful tool to make ensuring this simple—VM/host affinity rules.

VM/host affinity rules allow the creation of groups of hosts and groups of VMs and rules  to control/dictate their relationships. In this scenario, I have 8 hosts. 4 in datacenter A and 4 in datacenter B (as denoted with “a” or “b” in the host names.

ac95.png

Click on the cluster object in the inventory view, then the Configure tab followed by VM/Host rules in the side panel that appears. Click on the Add button to create a host group.

ac96.png

In the window that appears, assign the host group a name and choose the type of Host Group. Then click Add to add hosts.

ac97.png

Select the hosts that are in the same datacenter and click OK.

ac98.png

Confirm the configuration and click OK.

ac99.png

Repeat the process for the hosts in the other datacenter by putting them in their host group, in this case, datacenter B.

ac100.png

The next step is to create a VM group. Create as many VM groups as needed, but before their creation, keep in the mind the types of rules that can associated with VM and host groups:

  • Keep all VMs in a group on the same host—this makes sure that all VMs in a given VM group are all on the same host. This is useful for certain clustering applications or for possible multi-tenant environments where VMs must be grouped in a specific way.
  • Keep all VMs in a group on separate hosts—this makes sure all VMs in a group are never run on the same host. This is useful for clustered applications to survive a host failure (i.e. not-all-eggs-in-the-same-basket) or possibly performance sensitive applications that use too much resources to ever be on the same host.
  • Keep all VMs on the same host group­—this is a bit more flexible. This allows for VMs in a group to be required to be on a specific host group, or to never be on a specific host group. It also offers the ability to set preference with “should” rules. Rules can be set that they “should” be on a specific host group or they “should not” be on a specific host group. When a “should” rule is created, vSphere will always use the preferred host group if one or more of those hosts are available. If none are, then and only then will it use non-preferred hosts. If a “must” rule is selected, then in the absence of a preferred host in the host group, the VMs in the VM group will not be recovered automatically.
  • Boot one group of VMs up prior to booting up a second group of VMs—this allows some priority of rebooting. This is useful if certain VMs rely on applications in other VMs. If those applications are not running first, the dependent VMs will either fail to boot, or its applications will fail to start. This is common in the case of database and application servers. Or services like DNS or LDAP.

BEST PRACTICE: When creating host affinity rules, it is generally advisable to use a “should” or “should not” rule over a “must” or “must not” rule as a “must” rule could prevent recovery.

With these in mind, VM groups can be created. In this environment, all four of the eight VMs will be put into the VM group A and four will be put in group B. To do this, click on the Add button in VM/Host Groups like shown earlier for the host groups. 

ac101.png

Assign the group a name, choose VM group as the type and then click Add to select VMs.

ac102.png

Click OK and then OK again to create the VM group. Repeat as necessary for any VM groupings that are needed.

ac103.png

Note that VMs can be placed into more than one group—though it is recommended to keep VM membership to as few groups as possible for ease of rule management.

ac104.png

One VM groups and/or host groups have been created, rules can also be specified. To create and assign a rule, click on the VM/Host Rules section under the cluster Configure tab. Click Add to create a new rule.

ac105.png

Give the rule an informative name and choose a rule type, the definitions of which are described earlier in this section.

ac106.png

Depending on the chosen rule type, the options for the rule are different. In the above example, VMs in group A should be put on hosts in group A. Since the “should” rule was chosen, it could be broken in the case that no hosts in group A are available.

ac107.png

In the above environment, a second rule was created to keep VM group B VMs running on host group B hosts. Since this environment has vSphere DRS enabled and set to automatic, DRS automatically moved the VMs to the proper hosts as soon as the rule was committed.

ac108.png

Creating VM and host affinity rules provides the administrator with more direct and proactive control in preparation for an outage and subsequent HA and DRS VM placement response.

Failure Scenarios

A core part of understanding vSphere HA and ActiveCluster, is understanding how vSphere and ActiveCluster respond to failures. These failures are separated into two broad categories:

  1. Storage access lost. This could be due to the storage array failing, a pod going inactive, or loss of connectivity.
  2. Host failure. This could be due to the physical host failing, the hypervisor crashing, or network partitioning of the host.

vSphere HA Host Failure Response

A common failure is some type of host failure that renders ESXi unable to keep running virtual machines. This could be a power failure, or a host kernel crash, or otherwise.

vSphere HA will take over and restart virtual machines on other hosts when a host failure has been detected.

In this environment, there is a VM named VM02 running on host ac-esxi-a-06:

ac109.png

A host failure occurs and is marked as lost:

ac110.png

At this point, vSphere HA kicks in and restarts the VM on another host. Since this environment as a rule that this VM should only be on hosts in host group A, it is restarted on an “A” host.

ac111.png

This behavior is not unique to vSphere HA and ActiveCluster—this is a property of shared storage in general when combined with vSphere HA. The true benefit of ActiveCluster with vSphere HA is when all of the compute goes down in the site local to a set of VMs.

Continuing with the failure above, the remaining 3 hosts in host group A fail too (the four of them comprise all hosts in datacenter A). Since there are no hosts left in host group A, vSphere HA follows the “should” rule and has not choice but to reboot them on hosts in host group B which are in the other datacenter.

ac112.png

Since ActiveCluster presents the same VMFS datastore to both datacenters from a FlashArray in each datacenter, the hosts in the remote datacenter can reboot the VMs from the failed hosts in a different datacenter with ease.

The behavior of host failure is not impacted by whether or not the cluster is configured for Uniform or Non-Uniform connectivity.

vSphere HA and Storage Failure Response

In more severe failures, entire loss of the underlying storage system or SAN can occur, leading to ESXi hosts continuing to run, but with no storage access. This is a different type of failure than a host failure—as the host is still online but it cannot continue to run its VMs as there are no available paths to the local FlashArray

Depending on the failure and on the configuration of the stretched cluster (uniform or non-uniform), vSphere reacts differently and the failover mechanism changes.

Storage Failure Response with a Non-Uniform Stretched Cluster

As discussed previously in this paper, a non-uniform stretched cluster is a cluster of ESXi hosts which are split across to physical sites. Usually half in one datacenter and half in another. Each datacenter has a FlashArray and those two FlashArrays have ActiveCluster enabled on one or more volumes allowing the storage volume(s) to be presented simultaneously at both sites. The non-uniform portion of this configuration defines what paths to the storage the hosts see. In a non-uniform configuration, hosts only have paths to the volume (or volumes) via the FlashArray that is local to their datacenter.

In other words, if the FlashArray becomes unavailable, the ESXi hosts local to it no longer have access to the storage and a vSphere HA failover must occur to reboot any affected VMs on the remote hosts in the cluster that have access to the storage via the remote FlashArray.

In this scenario, essentially any storage-related failure will cause a vSphere HA failover such as:

  • Loss of local FlashArray
  • Accidental or purposeful removal of volume from access to hosts by administrator
  • Loss of SAN connectivity due to failure, power loss or administrative change.
  • Failure of host bus adapters (HBAs) in host or connecting cables

These failures all lead to the same result in the presence of non-uniform configurations: loss of storage connectivity of the host and a vSphere HA failover.

In the following environment, there are eight hosts total, four are in site “A” and four are in site “B”. There is a VMFS datastore presented to all hosts in the cluster, with a total of 32 VMs on it.

ac113.png

The datastore is hosted on a FlashArray volume that has been stretched to both FlashArrays using ActiveCluster.

ac114.png

Since this is a non-uniform configuration, the “A” hosts only have access to the volume via paths to the “A” FlashArray, and the “B” hosts only have access to the volume via paths to the “B” FlashArray.

In this cluster, half of the 32 virtual machines are running on “B” hosts and half are running on “A” hosts, so a storage failure on either side will cause 16 VMs to be restarted on the remaining hosts on the other side.

For this example, FlashArray “A” will experience a failure:

ac115.png

This causes the VMFS datastore to go inaccessible on the “A” hosts.

ac116.png

But because it is presented through both FlashArrays via ActiveCluster, the “B” hosts continue to have access to the datastore:

ac117.png

Once the timeouts have been reached (dependent on the APD/PDL timeout configuration) if APD/PDL responses have been enabled, ESXi will shut down the VMs on the hosts that lost storage access and restart them on hosts with surviving storage connections to the volume. The VMs will be shut down, which makes them briefly marked as inaccessible.

ac118.png

The VMs will then come back online and be powered-on elsewhere. In the below screenshot, the failed VMs are in various states of recovery:

ac119.png

When the FlashArray comes back online, the storage can then be seen again by the “A” hosts. If there are host-affinity rules, the VMs will be moved back almost immediately by vSphere DRS. Otherwise, they will remain where they are until manually moved, another failure occurs, or resource usage demands vSphere DRS rebalance the VMs across the cluster.

ac121.png

Storage Faillure Respnce with a Uniform Stretched Cluster

As discussed previously in this paper, a non-uniform stretched cluster is a cluster of ESXi hosts which are split across to physical sites. Usually half in one datacenter and half in another. Each datacenter has a FlashArray and those two FlashArrays have ActiveCluster enabled on one or more volumes allowing the storage volume(s) to be presented simultaneously at both sites. The non-uniform portion of this configuration defines what paths to the storage the hosts see. In a non-uniform configuration, hosts only have paths to the volume (or volumes) via the FlashArray that is local to their datacenter.

In other words, if the FlashArray becomes unavailable, the ESXi hosts local to it no longer have access to the storage and a vSphere HA failover must occur to reboot any affected VMs on the remote hosts in the cluster that have access to the storage via the remote FlashArray.

In this scenario, essentially any storage-related failure will cause a vSphere HA failover such as:

  • Loss of local FlashArray
  • Accidental or purposeful removal of volume from access to hosts by administrator
  • Loss of SAN connectivity due to failure, power loss or administrative change.
  • Failure of host bus adapters (HBAs) in host or connecting cables
  • These failures all lead to the same result in the presence of non-uniform configurations: loss of storage connectivity of the host and a vSphere HA failover.

In the following environment, there are eight hosts total, four are in site “A” and four are in site “B”. There is a VMFS datastore presented to all hosts in the cluster, with a total of 32 VMs on it.

ac122.png

The datastore is hosted on a FlashArray volume that has been stretched to both FlashArrays using ActiveCluster.

ac123.png

Since this is a non-uniform configuration, the “A” hosts only have access to the volume via paths to the “A” FlashArray, and the “B” hosts only have access to the volume via paths to the “B” FlashArray.

In this cluster, half of the 32 virtual machines are running on “B” hosts and half are running on “A” hosts, so a storage failure on either side will cause 16 VMs to be restarted on the remaining hosts on the other side.

For this example, FlashArray “A” will experience a failure:

 

ac124.png

This causes the VMFS datastore to go inaccessible on the “A” hosts.

ac125.png

But because it is presented through both FlashArrays via ActiveCluster, the “B” hosts continue to have access to the datastore:

ac126.png

Once the timeouts have been reached (dependent on the APD/PDL timeout configuration) if APD/PDL responses have been enabled, ESXi will shut down the VMs on the hosts that lost storage access and restart them on hosts with surviving storage connections to the volume. The VMs will be shut down, which makes them briefly marked as inaccessible.

ac128.png

The VMs will then come back online and be powered-on elsewhere. In the below screenshot, the failed VMs are in various states of recovery:

ac129.png

When the FlashArray comes back online, the storage can then be seen again by the “A” hosts. If there are host-affinity rules, the VMs will be moved back almost immediately by vSphere DRS. Otherwise, they will remain where they are until manually moved, another failure occurs, or resource usage demands vSphere DRS rebalance the VMs across the cluster.

ac130.png

Storage Failure Response with a Uniform Stretched Cluster

A uniform stretched cluster means that all hosts in the cluster have paths to a stretched volume through both FlashArrays servicing that volume via ActiveCluster. This configuration provides an additional level of resilience for virtual machines, as VMs can continue to run non-disruptively through the failure of an entire FlashArray, or the connectivity to it.

In the case of all hosts in one site losing all storage access (which usually means a local SAN failure) where those hosts cannot access their local FlashArray, nor their remote one, the failover process is identical to the process shown in the previous section on non-uniform failover. In that case vSphere HA will restart the affected virtual machines on the hosts in the remote site.

For uniform configurations, the case where just a single array fails is different. In this case, VMs can continue to run on their host, as those hosts can still access the storage, but via paths to the remote FlashArray. Therefore, this is not a case of a vSphere HA restart (which has downtime until the VM can be rebooted) but it is a case of multipathing simply failing over to the paths to the remote FlashArray. A multipathing failover is entirely non-disruptive, as a VM reboot is not required.

The below environment is configured in a uniform fashion, so all eight hosts have access to the VMFS protected via ActiveCluster through both FlashArrays:

ac131.png

Furthermore, the preferred FlashArray is set on the host object, so that only the paths for a given host to its local FlashArray are in-use (denoted by the (I/O)).

This ensures that, while available, reads and writes go down optimized paths to provide the best possible performance. The non-optimized paths are paths to the VMFS via the remote FlashArray (non-optimized) and will only be used in absence of any optimized paths.

After a failure of a FlashArray, the datastores from that array that are not protected by ActiveCluster go offline:

ac132.png

The ActiveCluster-enabled volume “vMSC-VMware::Uniform-FlashArrayVMFS” stays online.

The paths to that volume to the failed FlashArray are gone, but the paths to the remote site remain available and become active.

ac133.png

By comparing this image and the previous image of the paths, the L244 paths were the active paths, but after the failure, the L253 paths (which are the paths to the secondary FlashArray) are the active ones.

At this point, the virtual machines are running on non-optimized paths, meaning that their I/Os are running the WAN to the remote FlashArray, which will incur greater latency than if the VMs were running on hosts local to their FlashArray. If the FlashArray failure is expected to be extended, it might be advisable to vMotion the VMs running on non-optimized paths to hosts in the other datacenter that have paths to their local FlashArray available. If the failed FlashArray is expected to be recovered soon (like in the case of a temporary loss of power), the simplest option may be to just leave the VMs where they are. They will resume running on optimized paths as soon as the FlashArray comes back online.

FlashArray Storage Access

The following table describes different failure scenarios for ActiveCluster environments and how storage availability is affected.

Solution Component Failure

Access to Storage

One Array

Other Array

Replication Link

Mediator

UP

UP

UP

UP

Available on both arrays

UP

DOWN

UP

UP

Available on one array

UP

UP

DOWN

UP

Available on one array

UP

UP

UP

DOWN

Available on both arrays

UP

DOWN

DOWN

UP

Available on one array

UP

UP

DOWN

DOWN*

Unavailable

UP

DOWN

UP

DOWN*

Unavailable

DOWN

DOWN

N/A

N/A

Unavailable

* These rows refer to simultaneous failures of other components while the mediator is unavailable. If the mediator becomes unavailable after an array failure or a replication link failure has already been sustained, access to the mediator is no longer required and access to storage remains available on one array.

The following charts explain some of the detailed failure scenarios for ActiveCluster and the surrounding network.

Host and Storage Network Failures

Failure Scenario

Failure Behavior

Single or multiple host failure

Applications can automatically failover to other hosts in the same site or other hosts in the other site connected to the other array.

This is driven by VMware HA assuming clusters are stretched between sites.

Stretched SAN fabric outage (FC or iSCSI)

(failure of SAN interconnect between sites)

Host IO automatically continues on local paths in the local site.

Uniform connected hosts:

  • experience some storage path failures for paths to the remote array and continue IO on paths to the local array.
  • in each site will maintain access to local volumes with no more than a pause in IO.

Non-uniform connected hosts:

  • do not have a SAN interconnect between sites, so this scenario is not applicable.

SAN fabric outage in one site

Applications can automatically failover to hosts at the other site connected to the other array.

This is driven by host cluster software assuming clusters are stretched between sites. VMware HA, Oracle RAC, SQL Cluster, etc.

Uniform connected hosts:

  • in the site without the SAN outage, experience some storage path failures for paths to the remote array and continue IO on paths to the local array.
  • in the site with the SAN outage, will experience total loss of access to volumes and applications must failover to the other site as mentioned above.

Non-uniform connected hosts:

  • in the site without the SAN outage, will maintain access to local volumes.
  • in the site with the SAN outage, will experience total loss of access to volumes and applications must failover to the other site as mentioned above.

The next two sections will describe how ESXi hosts in a stretched cluster configuration respond to certain failures including:

  • Host failures—what happens to VMs running on a host when it fails
  • Array failure—what happens when a FlashArray fails

The focus will not be on why the array goes down, but instead if the array goes down, how does ESXi and vSphere HA react?

Array, Replication Network, and Site Failures

Failure Scenario

 

 

Failure Behavior

Local HA controller failover in one array.

 

 

After short pause for the duration of the local HA failover, host I/O will continue to both arrays without losing RPO-Zero.

Async Replication source transfers may resume from a different array than before the failover.

Replication link failure.

 

 

After short pause, host IO continues to volumes only on the array that contacts the mediator first. This is per pod.

Failover is automatic and transparent and no administrator intervention is necessary.

Uniform connected hosts:

  • after a short pause in IO, continue IO to the array that won the race to the mediator.  
  • experience some storage path failures for paths to the array that lost the race to the mediator.  
  • in the mediator losing site will maintain access to volumes remotely across stretched SAN to the mediator winning site.

Non-uniform connected hosts:

  • in the mediator winning site will maintain access to volumes with no more than a pause in IO.  
  • in the mediator losing site will experience total loss of access to volumes.
  • use host cluster software to recover the apps to a host in the mediator winning site. This may be automatic depending on the type of cluster.

Mediator failure or access to mediator fails.

 

 

No effect. Host IO continues through all paths on both arrays as normal.

Entire single array failure.

 

 

After short pause, Host IO automatically continues on surviving array.

Failover is automatic and transparent and no administrator intervention is possible or necessary.

Uniform connected hosts:

  • in the surviving array site, after a short pause in IO, continue IO to the surviving array that was able to reach the mediator.  
  • experience some storage path failures for paths to the failed array.
  • in the site where the array failed will do IO to volumes remotely across the stretched SAN to the surviving array.

Non-uniform connected hosts:

  • in the surviving array site, after a short pause in IO, continue IO to the surviving array that was able to reach the mediator.  
  • in the failed array site will experience total loss of access to volumes.
  • use host cluster software to recover the apps to a host in the other site. This may be automatic depending on the type of cluster.

Entire site failure

 

 

After short pause, Host IO automatically continues on surviving array.

Failover of the array is automatic & transparent and no administrator intervention is possible or necessary.

Uniform connected hosts:

  • in the surviving array site, after a short pause in IO, continue IO to the surviving array that was able to reach the mediator.
  • experience some storage path failures for paths to the array in the failed site.
  • use host cluster software to recover the apps to hosts in the surviving site. This may be automatic depending on the type of cluster.

Non-uniform connected hosts:

  • in the surviving site, after a short pause in IO, continue IO to the surviving array that was able to reach the mediator.
  • in the surviving site will maintain access to local volumes with no more than a pause in IO.  
  • use host cluster software to recover the apps to hosts in the surviving site. This may be automatic depending on the type of cluster.

Mediator failure followed by a second failure of replication link, or failure of one array, or failure of one site.

(2nd failure occurs while mediator is unavailable)

 

 

Host IO access is lost to sync rep volumes on both arrays.

This is a double failure scenario; data service is not maintained through failure of either array if the mediator is unavailable.

Options to recover:

  1. Restore access to either the mediator or replication interconnect and the volumes will automatically come back online, as per above scenarios.
  2. Clone a pod create new volumes with different LUN serial numbers. New LUN serial numbers will prevent hosts from automatically connecting to and using the volumes, avoiding split brain. Then re-identify and reconnect all LUNs on all hosts.

 

 

Conclusion

The combination of active-active replication on the FlashArray, called ActiveCluster, and vSphere High Availability, collectively referred to a solution called vSphere Metro Storage Cluster (vMSC). The simplicity and flexibility of the FlashArray itself and the ActiveCluster feature allows administrators to configure and offer a highly-available storage solution with ease.

 

References

  1. vMSC best practices: https://storagehub.vmware.com/#!/vsphere-storage/vmware-vsphere-r-metro-storage-cluster-recommended-practices
  2. VMFS deep dive: https://storagehub.vmware.com/#!/vsphere-storage/vmware-vsphere-vmfs
  3. iSCSI best practices: https://storagehub.vmware.com/#!/vsphere-storage/best-practices-for-running-vmware-vsphere-on-iscsi
  4. vSphere 5.5 HA guide: https://docs.vmware.com/en/VMware-vSphere/5.5/com.vmware.vsphere.avail.doc/GUID-63F459B7-8884-4818-8872-C9753B2E0215.html
  5. vSphere 6.0 HA guide: https://docs.vmware.com/en/VMware-vSphere/6.0/com.vmware.vsphere.avail.doc/GUID-63F459B7-8884-4818-8872-C9753B2E0215.html
  6. vSphere 6.5 HA guide: https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.avail.doc/GUID-63F459B7-8884-4818-8872-C9753B2E0215.html
  7. VMware ActiveCluster KB: https://kb.vmware.com/s/article/51656
  8. Active Cluster and VMware Support Page: https://www.vmware.com/resources/compatibility/vcl/partnersupport.php
  9. ActiveCluster best practices: https://support.purestorage.com/FlashArray/PurityFA/Protect/Replication/ActiveCluster_Requirements_and_Best_Practices
  10. ActiveCluster documentation: https://support.purestorage.com/FlashArray/PurityFA/Protect/Replication/ActiveCluster_Solution_Overview

 

About the Author 

ac134.png

Cody Hosterman is the Technical Director for VMware Solutions at Pure Storage. His primary responsibility is overseeing, testing, designing, documenting, and demonstrating VMware-based integration with the Pure Storage FlashArray platform. Cody has been with Pure Storage since 2014 and has been working in vendor enterprise storage/VMware integration roles since 2008.

Cody graduated from the Pennsylvania State University with a bachelors degreee in Information Sciences & Technology in 2008. Special areas of focus include core ESXi storage, vRealize (Orchestrator, Automation and Log Insight), Site Recovery Manager and PowerCLI. Cody has been named a VMware vExpert from 2013 through 2017.

 

Blog: www.codyhosterman.com

Twitter: www.twitter.com/codyhosterman

YouTube: https://www.youtube.com/codyhosterman