SafeMode with vVols User Guide
So how does SafeMode impact vVols? This comes back to what SafeMode does. Enabling SafeMode on the FlashArray disables the ability to eradicate destroyed volumes and destroyed snapshots, introduces an adjustable Eradication Timer (from the default 24 hours), disables the ability to remove volumes from protection groups, disables the ability to turn off snapshot/replication schedules and disables the ability to shorten the protection group retention period. The VASA service is now unable to automatically eradicate objects, edit protection group schedules (for even newly created protection groups), disable replication/snapshot schedules and remove volumes from protection groups.
Workflows Impacted by SafeMode
With knowing what SafeMode does and how it impacts the VASA service we can look at what workflows and features are impacted by enabling SafeMode on the FlashArray.
VASA Workflow or Feature Impacted | What is the Impact and the Side Effects |
---|---|
Automatically destroying and eradicating Swap Volumes |
When the Virtual Machine is powered off VASA will automatically destroy and eradicate the swap volume for the VM. When a VM is vMotioned between hosts, a temp Swap vVol is created for that process and then destroyed/eradicated. With SafeMode enabled VASA is no longer able to automatically eradicate the Swap vVols leading to higher object counts. |
Automatically destroying and eradicating managed snapshot objects | Normally VASA will automatically destroy and eradicate managed snapshot objects when vSphere deletes a managed snapshot. With SafeMode enabled VASA will no longer be able to automatically eradicate the manage snapshot objects. |
Using the auto-rg option when assigning replication groups with SPBM | With SafeMode enabled VASA will not be able to change the protection group schedules to be greater than what the default schedules are. At this time, Pure recommends to not use the auto-rg option when SafeMode is enabled as it will be inconsistent. |
Test failover replication group cleanup workflows | During a testFailoverReplicationGroupStop is issued to VASA, VASA will automatically destroy and eradicate all volumes and volume groups that were used for the test. With SafeMode enabled VASA will no longer be able to eradicate those objects. They will still be destroyed, but will be in a pending eradication workflow. This will lead to increased object counts on the recovery array. |
Failover replication group workflows can not disable replication schedule on source replication group | During a failoverReplicationGroup workflow VASA disables the replication schedule on the source replication group and when the reverseRepliationGroup is called the new protected replication group has it's replication scheduled enabled. When SafeMode is enabled, VASA will be unable to disable the source schedule. Which in turn will cause additional snapshots to be replicated and object counts to increase. |
When Changed Block Tracking (CBT) is initially enabled, volume diff array volume snapshots eradication is disabled |
When changed block tracking is initially enabled vSphere will query the VASA provider to find out what blocks are allocated for each virtual disk. This will help vSphere build out tracking files as part of enabling CBT. Currently VASA will create volume snapshots to scan for perform diffs against to find the allocated blocks. This will generally be done to scan segments in 128 GB length, which means that a single 1 TB data vVol, VASA will need to create 8 volume snapshots. A virtual machine with 15 TB worth of virtual disks will have around 120 volume snapshots in order to enable CBT. This volume snapshots will not be eradicated automatically when SafeMode is enabled. Features from vSphere 7.0 U1+ will help by allowing the VASA provider to provide next allocated block hints as part of the allocated block queries. In particular this will help with large sparsely filled virtual disks. This can help decrease the amount of volume snapshots for 15 TB worth of virtual disks from 120 to around 15. Keep in mind that this is just when the initial managed snapshot is taken after enabling CBT. Future managed snapshots will not need to build out the allocated block tracking and will not create the volume snapshots as part of the managed snapshot process. |
When storage vMotion from vVols to VMFS, volume diff array volume snapshots eradication is disabled |
Similar to the CBT enabling workflow, when a storage vMotion from vVols to VMFS is performance vSphere will issue requests to VASA to build find out what blocks are allocated. The workflow for this will have much smaller segment lengths though and will have a higher value of volume snapshots that are created for this workflow. The bitmap hint feature in vSphere 7.0 U1+ will help here as well. When SafeMode is enabled a storage vMotion from vVols to VMFS will create many volume snapshots that will not be eradicated. Customers will have to plan and monitor accordingly if storage vMotions from vVols to VMFS are required to be preformed. |
Planning and Recommendations
Now that the impacted workflows and features are out in the open here is what needs to be planned for.
- Any FlashArray should be running Purity 6.1.8 or higher when using vVols before enabling SafeMode.
- vSphere Environment running 7.0 U1 or higher is ideal to leverage the allocated bitmap hint as part of VASA 3.5.
- Object count, object count, object count. Seriously, the biggest impact that enabling SafeMode will have is on object count. Customers that want to enable SafeMode must plan to always be monitoring the object counts for volumes, volume groups, volumes snapshots and pgroup snapshots. Do not just monitor current object counts but all pending eradication object counts as well.
- The use of Auto-RG for SPBM when assigning replication groups to a VM should not be used.
- Once a VM has a storage policy replication group assigned, VASA will be unable to assign a different replication group. Plan that once a storage policy and replication group are assigned, that the vSphere admin will be unable to change that with SafeMode enabled.
- Failover replication group workflows will not be able to disable replication group schedules. Nor will cleanup workflows be able to eradicate objects. Users must plan for higher object counts after any tests or failover workflows.
- Environments that are frequently powering on/off VMs or vMotioning between hosts will have higher amounts of swap vVols pending eradication. Should the eradication timer be changed to be longer than 24hr, then they will be pending eradication for longer time. Storage and vSphere admins will have to plan around higher object counts with these environments.
- In some cases, vSphere Admins may want to configure a VMFS Datastore that is shared between all hosts to be the target for VMs Swap.
- When changed block tracking (CBT) is enabled the first time, this will increase the amount of volume snapshots pending eradication. Backup workflows that periodically refresh CBT (disable and re-enable CBT) will increase the amount of this volume diffs that are issued. Pure does not recommend to frequently refresh CBT. Once enabled, CBT should not normally need to be refreshed.
Looking Forward
Pure Storage recognizes that the current compatibility between vVols and SafeMode is not where we want it to be. While there isn't a specific timeline for when these features will be released here is what Pure is working on.
- Improving the method and workflow that enabling CBT uses when querying the allocated bitmaps for the data vVols. Current implementation is to take volume snapshots in order to do the volume diffs. In the future this will be able to do volume diffs for the allocated bitmap calls from the data vVol itself. This will decrease the impact that SafeMode has on volume snapshot object counts.
- Increasing volume group, volume and volume snapshot object limits. While some of this work has already been done in Purity//FA 6.1 and more is being done for Purity//FA 6.2, there will be continued work by Pure Storage on object limits.
- Allow VASA to update protection group schedules when there are no objects in the protection group. This will allow the use of auto-rg when assigning replication groups.