A key component of any successful VMware implementation is acceptable performance. When performance does not align with expectations, this leads to impacts to the business that can ultimately impact the bottom line. The challenge with performance, of course, is that it means different things to different people and can be a very hard problem to both define and diagnose. This article is by no means meant to be an exhaustive list of options, but will hopefully provide some useful places to look for confirming and troubleshooting performance issues between Azure VMware Solution and Pure Cloud Block Store.
Poor performance can lead to workflows taking longer than expected, poor user experience and production environments experiencing unexpected downtime or slowness. In more rare cases, an event like ESXi host CPU utilization hitting 100%, or extremely high latency on CBS can lead to a more severe production outage.
Verify AVS and CBS Integration Best Practices are Applied
There are a few important best practices that must be utilized for AVS and CBS to run with maximum performance. Those best practices are outlined here, along with how to verify that they are in use.
Confirm AVS and CBS are in the same Azure Availability Zone (AZ)
A given Azure Region often has multiple Availability Zones (AZs). Azure availability zones are physically separated datacenters with their own independent power source, network, and cooling. Because AZs are physically separated from one another within the same region, an important performance consideration is to make certain that AVS and CBS are deployed into the same AZ so as to minimize the amount of network hops between the two solutions.
An important item to note is that Availability Zones (AZs) are consistent within the context of a single AVS subscription. That is, AZs from a customer subscription standpoint are logically grouped and randomized on a per customer subscription basis, and there is no guarantee that AZ1 for Subscription ID1 would be the same as AZ1 for Subscription ID2.
To confirm that AVS and CBS are deployed inside of the same AZ, follow these steps:
To check the AVS AZ in use, navigate to the AVS object inside of the Azure portal. From the Overview screen, the AZ can be found as shown in the below screenshot:
To check which AZ CBS is utilizing, navigate to the CBS object in the Azure portal, select the CBS Managed Application and navigate to Parameters and Outputs on the left menu. Scroll down until you find the zone.
Confirm an Ultra SKU Virtual Network Gateway is in use
The ExpressRoute connection between AVS and CBS is accomplished via a Virtual Network Gateway. There are different available SKUs for the Virtual Network Gateway and it is strongly recommended to use the Ultra (or ErGwAz3) SKU as this provides the highest amount of throughput at the lowest available latency. Furthermore, this is the only SKU which offers the FastPath feature, which will be covered in the next bullet point.
To confirm that the Ultra SKU gateway is in use, navigate to the virtual network gateway and find the SKU in the overview section. If the Ultra SKU is not in place, there are different upgrade options available to get to it.
Confirm FastPath is in use
FastPath is a feature of the Ultra SKU Virtual Network Gateway that enables AVS virtual machine traffic hosted on a CBS datastore to skip the Virtual Network Gateway and provide a much more direct network connection (less network hops) and lower latency between the two solutions.
FastPath is enabled when the connection is established between AVS and CBS via the Virtual Network Gateway. To confirm it is in use, find the AVS and CBS vNET gateway connection in the Azure portal, and inspect the properties of the connection, confirming the FastPath box is checked.
If FastPath is not enabled, check the box and click Save. If FastPath is not showing as available, consider upgrading to the ErGWAz3 / Ultra SKU Virtual Network Gateway.
Inspect Volume Performance on CBS
Pure Cloud Block Store offers in-depth storage performance metrics that can be accessed via the CBS GUI. One important metric to pay particularly close attention to is the various kinds of latency and what their source is.
As a primer, CBS reports on the following different types of latency:
SAN Time: Average time, measured in milliseconds, required to transfer data between the initiator and the array. SAN times are only displayed in graphs of one I/O type, such as Read, Write, or Mirrored Write.
QoS Rate Limit Time: Average time, measured in microseconds, that all I/O requests spend in queue as a result of bandwidth limits reached on one or more volumes. QoS rate limit times are only displayed in graphs of one I/O type, such as Read, Write, or Mirrored Write.
Queue Time: Average time, measured in microseconds, that an I/O request spends in the array waiting to be served. The time is averaged across all I/Os of the selected types
- Read/Write Latency: Average arrival-to-completion time, measured in milliseconds, for a read or write operation.
Array-level and volume-level performance metrics can be found by logging into the CBS GUI, selecting Analysis and then Performance. Select the Volumes button (3) to see the list of volumes on the array. An individual or multiple volume(s) can be selected (4) to review. Select Read or Write (5) to in order to see a breakdown of the individual latency components outlined above.
This example shows a CBS array with a poor networking configuration (very high SAN time). In situations like this, review all AVS and ensure CBS best practices in the previous section have been applied. If they have and the SAN time is still high, open a support ticket with Pure Storage or contact your account team for assistance. High SAN time is relative to the customer environment and other factors, but anything over several milliseconds should be considered high and investigated further.
Deploy VMAnalytics for VM-level Performance Metrics
VMAnalytics is a free tool provided by Pure Storage that provides granular, per-virtual machine performance metrics throughout the entire ESXi stack. This tool is deployed via the Pure Storage VMware OVA. These are the instructions on how to deploy the OVA and register it against your AVS vCenter.
Highlights of the offering include:
- AVS stores performance (and other) metrics in 20 second intervals for up to the last hour, for the entire AVS environment
- The VMAnalytics metrics are stored, processed and displayed to the user in Pure1 Manage.
- Users can see metrics such as Latency, IOPS, Bandwidth across all of their arrays, including on-premises FlashArrays in addition to CBS.
As shown in the below screenshot, this tool enables the user to highlight a specific VM, or a specific metric and view where latency, bandwidth or other important metrics are being introduced over the entire VMware stack.