With the release of Purity 5.3 there was an upgrade to the backend database service on the FlashArray. There is a KB about how upgrading to Purity 5.3 can impact vVols and how these backend changes help out vVols and VASA overall. There is a KB that does cover the lifecycle of a VASA Op now as well that would be good to review as the impact of the changes in VASA 1.1.0 and Purity 5.3 can be understood better with that knowledge as well. While the improvements to database service are found in all versions of Purity 5.3, VASA is not fully taking advantage of those changes until 1.1.0 was released. Here are some of the core improvements that were made to VASA 1.1.0.
- Significant improvement to the performance of Bind and Unbind requests
- Optimized processing batched bind and unbind requests
- Optimized the overall bind request and virtual volume query operations
- Optimized the handling of Replication Group Queries and simplified the response back to vSphere
- An increase in overall resources for VASA to communicate to the FlashArray's backend database service
- Increased the amount of task threads VASA has to the database service
- Increased the maximum amount of connections that VASA has to the database service
- Optimized the Managed Snapshot deletion process to take less time to complete
- Improved the VASA service's management of memory to address VASA OOM's
- Pure identified VASA processes to optimize memory usage and stop causing Out of Memory events in VASA
Just how much has the performance improved with Purity 5.3.6 and VASA 1.1.0 compared to previous versions of Purity and VASA? Great question, here are a couple of examples.
Performance Testing Examples
The same processes were tested on a FlashArray running Purity 5.1.15 and then the other FlashArray was running Purity 5.3.6. In the tests there are 1000 vVol based VMs in the environment. Both FlashArrays had the same amount of workload running on them, about 50k IOPs, and similar load metrics.
The first test was to power on 100 VMs at once until all 1000 VMs were powered on, so 10 batches of 100 power on requests.
- On Purity 5.1.15 the process to power on all 1000 VMs took just over 3 hours to complete. This was done several times and generally was between 2 1/2 hours and 3 hours to complete the process.
- On Purity 5.3.6 the process to power on all 1000 VMs took just over 20 minutes. This was also done several times and would generally be between 18 and 22 minutes to complete the task.
Where we really see the huge jump is doing these requests at high load and scale. For example, with the Purity 5.3.6 array, we cranked the workload up to 150k IOPs and the array was just over 70% load. The same process of powering on 1000 VMs took just 30 minutes.
Another test was to see how long it took to put a host into maintenance mode when there were 100+ vVols based VMs powered on the Host.
- On Purity 5.1.15 the process to place a single host into maintenance mode took between 60 and 90 minutes.
- On Purity 5.3.6 the process to place a single host into maintenance mode took between 6 and 9 minutes.
- When we cranked the workload up to 150k IOPs and the array was at 70% load, it only took between 10 and 12 minutes.
The difference between the two performance tests is quite dramatic. The next test that we wanted to run through was to take managed snapshots of the vVols based VMs when all 1000 VMs were powered on. The process was to take a managed snapshot of 100 VMs at the same time, and then do it 10 times to get all 1000 VMs. Then once those requests finished, wait 15 minutes and then destroy the managed snapshots of 100 VMs at the same time, etc.
- On Purity 5.1.15 the managed snapshot process took aver 90 minutes, but only about 20% of the managed snapshots completed successfully.
- Destroying the managed snapshots took between 15 and 20 minutes.
Keep in mind that this is only destroying 20% of the managed snapshots requests that completed.
- Destroying the managed snapshots took between 15 and 20 minutes.
- On Purity 5.3.6 the managed snapshot process took just over 6 minutes with 100% of the managed snapshots completing successfully.
- Destroying the managed snapshots took between 10 and 14 minutes.
- Then on Purity 5.3.6, testing at 150k IOPs and 70% load, the process took 20 minutes, but only 85% of the snapshot requests completed successfully. Keep in mind that all 1000 VMs are powered on, the array is under higher levels of load and 100 snapshot requests come in at the same time.
- Destroying the managed snapshots took between 12 and 15 minutes.
Once more we can see the significant difference between Purity 5.3.6 with VASA 1.1.0 and Purity 5.1.15 with VASA 1.0.2. Here is a table that summarizes some of the tests that were ran:
|Tests at 50k IOPs and ~30% Load||Purity 5.1.15 Test Completion Time||Purity 5.3.6 Test Completion Time|
|Cloning 1000 VMs from Template in batches of 100||80 Minutes||20 Minutes|
|Powering on 1000 VMs in batches of 100||180 Minutes||20 Minutes|
|Taking Managed Snapshots of 1000 VMs in batches of 100||100 Minutes
(20% Success Rate)
(100% Success Rate)
|Destroying Managed Snapshots of 1000 VMs in batches of 100||20 Minutes
(only destroying the 20% success)
(Destroying the 100% Success)
|Placing a host into Maintenance mode with 100+ vVols based VMs powered on that host||60 Minutes||8 Minutes|
|Powering off 1000 VMs in batches of 100||90 Minutes||10 Minutes|
|Destroying 1000 VMs in batches of 100||60 Minutes||10 Minutes|
There are additional tests that are still being ran and with differing tuning of the VASA Provider. The main takeaway here is that VASA Performance is dramatically improved in VASA 1.1.0 and that with a consistent performance baseline Pure will be able to continue testing at differing scale and load on the FlashArray.
- VASA Provider version 1.1.0 fully supports array based replication for vVols in Site Recovery Manager. The requirements can be found here.
- The vSphere API Failover Replication Group no longer removes the source VMs volumes and volume groups on the Source FlashArray.
- When a Failback operation is ran, the old protection group, volumes and volume groups will be re-used as part of the Failover Replication group process.
- Please see the vVols and SRM documentation here for more information.
- Purity 5.3 introduced a new feature to protection group (pgroup) snapshots. Traditionally any pgroup snapshot that was manually issued had to be done from the source FlashArray. Now the pgroup snapshot can be initiated from the target FlashArray. Here is an example of initiating the snapshot replication from the target FlashArray.
Here is the pgroup that the example will use:
purepgroup list sn1-x70-b05-33:x70-1-policy-ac1-light-001 Name Source Targets Host Groups Hosts Volumes sn1-x70-b05-33:x70-1-policy-ac1-light-001 sn1-x70-b05-33 sn1-m20r2-c05-36 - - sn1-x70-b05-33:Config-01d38118 sn1-x70-b05-33:Config-02e76fe3 sn1-x70-b05-33:Config-035e45e6 sn1-x70-b05-33:Config-05858f94 sn1-x70-b05-33:Config-06c89486
We are running this commands from the Target FlashArray, sn1-m20r2-c05-36.
Here is the syntax for issuing the replicate-now from target:
purepgroup snap --replicate-now -h usage: purepgroup snap [-h] [--replicate | --replicate-now | --for-replication] [--suffix SUFFIX] [--apply-retention] [--on ON] PGROUP ... positional arguments: PGROUP protection group name optional arguments: -h, --help show this help message and exit --replicate arrange for this snapshot to be replicated when the replication schedule allows --replicate-now replicate this snapshot to the specified targets immediately --for-replication this snapshot will be used for manual replication request at a later time --suffix SUFFIX snapshot suffix --apply-retention this snapshot will be retained and eradicated by the local and remote schedules --on ON source of protection group
The --on optional argument is what is used to specify which Array to initiate the replication job. The source FlashArray will be given as part of that argument.
purepgroup snap --replicate-now --on sn1-x70-b05-33 x70-1-policy-ac1-light-001 Name Source Created Remote x70-1-policy-ac1-light-001.139 x70-1-policy-ac1-light-001 2020-06-04 10:47:53 PDT sn1-x70-b05-33
Checking on the transfer for the pgroup, the new snapshot can be found being replicated to the target array.
purepgroup list sn1-x70-b05-33:x70-1-policy-ac1-light-001 --snap --transfer Name Source Created Started Completed Progress Data Transferred Physical Bytes Written sn1-x70-b05-33:x70-1-policy-ac1-light-001.139 sn1-x70-b05-33:x70-1-policy-ac1-light-001 2020-06-04 10:47:53 PDT 2020-06-04 10:47:53 PDT - 2.06% 7.10M 4.50M sn1-x70-b05-33:x70-1-policy-ac1-light-001.138 sn1-x70-b05-33:x70-1-policy-ac1-light-001 2020-06-04 10:02:00 PDT 2020-06-04 10:01:59 PDT 2020-06-04 10:03:39 PDT 100.00% 105.78M 100.88M
- With the new feature of initiation replication jobs from the target array, the VASA Provider now correctly handles the SyncReplicationGroup API. When a Sync is issued (from SRM, PowerCLI, Python, etc), the VASA Provider will now initiate the snapshot to be replicated at the time of the request. Prior to VASA 1.1.0, the Sync call would wait for the next scheduled snapshot to be replicated before completing.
- With VASA 1.1.0, the FlashArray VASA Provider officially supports multi-vCenter when not in Linked Mode. There is a KB that details the process of configuring the VASA provider for Multi-vCenter environments that can be found here.