vVols Best Practices Summary
If using Virtual Volumes and FlashArray replication, ensure that anticipated recovery site is running vSphere 6.7 or later.
As always, please ensure you follow standard Pure Storage best practices for vSphere.
Currently VMware does not support Stretched Storage with vVols. This means that due to limitations both with Purity//FA and vSphere, vVols is not supported with ActiveCluster. vVols are also not supported with ActiveDr.
Pure Storage and VMware are actively partnered to develop support for stretched storage (ActiveCluster) for vVols and have targeted 1H of 2024CY for release, release timelines are subject to change.
vVols Best Practices Quick Guidance Points
Here are some quick points of guidance when using vVols with the Pure Storage FlashArray. These are not meant to be Best Practices deep dives nor a comprehensive outline of all best practices when using vVols with Pure Storage; a Best Practices deep dive will be given in the future. However, more explanation about the requirements and recommendations are given in the summary above.
While vVols support was first introduced with Purity 5.0.0, there have been significant fixes and enhancements to the VASA provider in later releases of Purity. Because of this, Pure has set the required Purity version for vVols to a later release.
- For general vVols use, while Purity 5.3 and 5.1 can support vVols both Purity release are end of life. As such, the minimum target Purity version should be Purity//FA 6.1.8.
Pure Storage recommends that customers running vVols upgrade to Purity//FA 6.2.10 or higher.
The main reason behind this is that there are enhancements to VASA to help support vVols at higher scale, performance of Managed Snapshots, and SPBM Replication Group Failover API at scale. For more information please see what's new with VASA Provider 2.0.0.
While vSphere Virtual Volumes 2.0 was released with vSphere 6.0, the Pure Storage FlashArray only supports vSphere Virtual Volumes 3.0, which was release with vSphere 6.5. As such, the minimum required vSphere version is 6.5 GA release. That being said, there are significant fixes specific to vVols so the required versions and recommended versions are as follows:
- Requirement: vSphere Version is 6.5 U3 or higher (vSphere 6.5 and 6.7 are both be end of life since October of 2022 and will be at end of technical guidance in October of 2023)
- Recommended: vSphere Version 7.0 Update 3f or higher
vSphere 7.0 Update 3 released several improvements for vVols and running vVols at scale. vSphere 7.0 Update 3f should be the minimum vSphere version to target for running vVols with Pure Storage in your vSphere environment.
With regards to the vSphere environment, there are some networking requirements and some strong recommendations from Pure Storage when implementing vVols in your vSphere Environment.
- Requirement: NTP must be configured the same across all ESXi hosts and vCenter Servers in the environment. The time and data must be configured to the current date/time.
- Recommended: Configure Syslog forwarding for vSphere environment.
- Requirement: Network port 8084 must be open and accessible from vCenter Servers and ESXi hosts to the FlashArray that will be used for vVols.
- Recommended: Use Virtual Machine Hardware version 14 or higher.
- Requirement: Do not run vCenter servers on vVols.
- While a vCenter server can run on vVols, in the event of any failure on the VASA Management Path combined with a vCenter server restart, the environment could enter a state where vCenter Server may not be able to boot or start. Please see the failure scenerio KB for more detail on this.
- Recommended: Either configured a SPBM policy to snapshot all of the vVol VM's Config vVols or manually put Config vVols in a FlashArray protection group with snapshot scheduled enabled.
- A snapshot of the Config vVol is required for the vSphere Plugin's VM undelete feature. Having a backup of the Config vVol also helps the recovery process or roll back process for the VM in the event that there is an issue. There is a detailed KB that outlines some of these workflows that can be found here.
Here is some more detail and color for the requirements and recommendations with the FlashArray:
- Requirement: The FlashArray Protocol Endpoint object 'pure-protocol-endpoint' must exist. The FlashArray admin must not rename, delete or otherwise edit the default FlashArray Protocol Endpoint.
- Currently, Pure Storage stores important information for the VASA Service with the pure-protocol-endpoint namespace. Destroying or renaming this object will cause VASA to be unable to forward requests to the database service in the FlashArray. This effectively makes the VASA Provider unable to process requests and the Management Path to fail. Pure Storage is working to correct this and improve this implementation in a future Purity release.
- Recommendation: Create a local array admin user when running Purity 5.1 and higher. This user should then be used when registering the storage providers in vCenter.
- Recommendation: Following vSphere Best Practices with the FlashArray, ESXi clusters should map to FlashArray host groups and ESXi hosts should map to FlashArray hosts.
- Recommendation: The protocol endpoint should be connected to host groups on the FlashArray and not to individual hosts.
- Recommendation: While multiple protocol endpoints can be created manually, the default device queue depth for protocol endpoints is 128 in ESXi and can be configured up to 4096. This generally means adding additional protocol endpoints is often unnecessary.
VASA Provider/Storage Provider
The FlashArray has a storage provider running on each FlashArray controller called the VASA Service. The VASA Service is part of the core Purity Service, meaning that it automatically starts when Purity is running on that controller. In vSphere, the VASA Providers will be registered as Storage Providers. While Storage Providers/VASA Providers can manage multiple Storage Arrays, the Pure VASA Provider will only manage the FlashArray that it is running on. Even though the VASA Service is running and active on both controllers, vCenter will only use one VASA Provider as the active Storage Provider and the other VASA Provider will be the Standby Provider.
Here are some requirements and recommendations when working with the FlashArray VASA Provider.
- Requirement: Register both VASA Providers, CT0 and CT1, respectively.
- While it's possible to only register a single VASA provider, this leaves a single point of failure in your management path.
- Recommendation: Do not use a Active Directory user to register the storage providers.
- Should the AD service/server be running on vVols, Pure Storage strongly recommends not to use an AD user to register the storage providers. This leaves a single point of failure on the management path in the event that the AD User have permissions changed, password changed or the account is deleted.
- Recommendation: User a local array admin created to register the storage providers.
- Recommendation: Should the FlashArray be running Purity 5.3.6 or higher, Import CA signed certificates to VASA-CT0 and VASA-CT1
Managed Snapshots for vVols based VMs
One of the core benefits of using vVols is the integration with storage and vSphere Manage Snapshots. The operations of the managed snapshot are offloaded to the FlashArray and there is no performance penalty for keeping the managed snapshots. When the operations behind managed snapshot are offloaded to VASA and the FlashArray, this creates additional work being done on the FlashArray that is not there with managed snapshots on VMFS.
Massive improvements to vVols performance at scale and load has been released with the FlashArray VASA Provider 2.0.0 with Purity//FA 6.2 and 6.3
Pure Storage's recommendation when using vVols with the FlashArray is to upgrade to a Purity//FA 6.2.10 or higher.
Please see the KB What's new with VASA Provider 2.0.0 for more information.
Here are some points to keep in mind when using Managed Snapshots with vVols based VMs.
- Managed Snapshots for vVols based VMs create volumes for each Data vVol on that VM that have a -snap suffix in their naming.
- The process of taking a managed snapshot for a vVol based VM will first issue a Prepare Snapshot Virtual Volume operation which will cause VASA to create placeholder data-snap volumes. Once completed vSphere will then send the Snapshot Virtual Volume request after stunning the VM. VASA will then take consistent point in time snapshots of each data vVol and copy them out to the placeholder volumes previously created. Once the requests complete for each virtual disk the VM is unstunned and the snapshot is completed.
- With FA volumes being created for the managed snapshot, this directly impacts the volume count on the FlashArray. For example, a vVol VM with 5 VMDK (Data vVols) will create 5 new volumes on the FA for each managed snapshot. If 3 managed snapshots are taken, then this VM has a volume count on the FA of 22 volumes (1 Config and 20 Data vVols while powered off; 1 additional Swap vVol while powered on).
- Managed Snapshots only trigger Point in Time snapshots of the Data vVols and not the Config vVol. In the event that the VM is deleted and a recovery of the VM is desired, it will manually have to be done from a pgroup snapshot.
- The process of VMware taking a managed snapshot is fairly serialized; specifically, the snapshotVirtualVolume operations are serialized. This means that if a VM has 3 VMDKs (Data vVols), the snapshotVIrtualVolume request will be issued for one VMDK and after it's complete the next VMDK will have the operation issued against it. The more VMDKs a VM has, the larger the impact to how long the managed snapshot will take to complete. This could increase the stun time for that VM.
- VMware has committed to improveing the performance of these calls from vSphere. In vSphere 7.0 U3 they have updated snapshotVirtualVolume to use the max batch size advertised by VASA to issue snapshotVirtualVolume calls with multiple data vVols. Multiple snapshotVirtualVolume calls for the same VM will be issued close to the same time now as well in the event that the number of virutal disks is greater than the max batch size.
- Recommendation: Plan accordingly when setting up managed snapshots (scheduled or manual) and configuring backup software which leverages managed snapshots for incremental backups. The size of the Data vVols and the amount of Data vVols per VM can impact how long the snapshot virtual volume op takes and how long the stun time can be for the VM.
Storage Policy Based Management (SPBM)
There are a few aspects of utilizing Storage Policies with vVols and the FlashArray to keep in mind when managing your vSphere Environment.
- Storage Policies can be compatible with one or multiple replication groups (FlashArray protection groups).
- While storage policies can be compatible with multiple replication groups, when applying the policy to a VM, mutliple replication groups should not be used. The VM should be part of a single consistency group.
- SPBM Failover workflow APIs are ran against the replication group and not the storage policy itself.
- Recommendation: Attempt to keep replication groups under 100 VMs. This will assist with the VASA Ops being issued against the policies and replication groups and the time it takes to return these queries.
- This includes both Snapshot and Replication enabled protection groups. These VASA Ops, such as queryReplicationGroup, will look up all objects in both local replication and snapshot pgroups, as well as target protection groups. The more protection groups and the more objects in protection groups will inherently cause these queries to take longer. Please see vVols Deep Dive: Lifecycle of a VASA Operation for more information.
- Recommendation: Do not change the default storage policy with the vVols Datastore. This could cause issues in the vSphere UI when provisioning to the vVols Datastore.
FlashArray SafeMode with vVols
For FlashArrays with SafeMode enabled additional considerations and planning will be required for the best experience. As the management of storage is done through VASA, the VASA service frequently will create new volumes, destroy volumes, eradicate volumes, place volumes in FlashArray protection groups, remove volumes from FlashArray protection groups and disable snapshot/replication schedules.
For more detailed information on SafeMode with vVols see the User Guide. Here is a quick summary of recommendations when running vVols with SafeMode enabled on the FlashArray.
- Any FlashArray should be running Purity 6.1.8 or higher when using vVols before enabling SafeMode.
- vSphere Environment running 7.0 U1 or higher is ideal to leverage the allocated bitmap hint as part of VASA 3.5.
- Object count, object count, object count. Seriously, the biggest impact that enabling SafeMode will have is on object count. Customers that want to enable SafeMode must plan to always be monitoring the object counts for volumes, volume groups, volumes snapshots and pgroup snapshots. Do not just monitor current object counts but all pending eradication object counts as well.
- The use of Auto-RG for SPBM when assigning replication groups to a VM should not be used.
- Once a VM has a storage policy replication group assigned, VASA will be unable to assign a different replication group. Plan that once a storage policy and replication group are assigned, that the vSphere admin will be unable to change that with SafeMode enabled.
- Failover replication group workflows will not be able to disable replication group schedules. Nor will cleanup workflows be able to eradicate objects. Users must plan for higher object counts after any tests or failover workflows.
- Environments that are frequently powering on/off VMs or vMotioning between hosts will have higher amounts of swap vVols pending eradication. Should the eradication timer be changed to be longer than 24hr, then they will be pending eradication for longer time. Storage and vSphere admins will have to plan around higher object counts with these environments.
- In some cases, vSphere Admins may want to configure a VMFS Datastore that is shared between all hosts to be the target for VMs Swap.
- When changed block tracking (CBT) is enabled the first time, this will increase the amount of volume snapshots pending eradication. Backup workflows that periodically refresh CBT (disable and re-enable CBT) will increase the amount of this volume diffs that are issued. Pure does not recommend to frequently refresh CBT. Once enabled, CBT should not normally need to be refreshed.