vVols Deep Dive: Lifecycle of a VASA Operation
This deep dive will be first going through some terminology that will be used throughout the article. Afterwards an overview of what a VASA operation is and the lifecycle of that operations will be covered. To conclude the article, a discussion about how a VASA operation can be impacted will be covered.
Since many of these terms will be abbreviated and used as acronyms, it's important to go over what they mean before we dive deep into the topic. Some of the concepts/terms have a couple of names, so both are used in those cases.
|A PE is a volume of zero capacity with a special setting in its Vital Product Data (VPD) page that ESXi detects during a SCSI inquiry. The PE effectively serves as a mount point for vVols. A PE is the only FlashArray volume that must be manually connected to hosts to use vVols. The industry term for a PE is "Administrative Logical Unit".|
vSphere APIs for Storage Awareness (VASA) is the VMware-designed API used to communicate between vSphere and the underlying storage, in the case for Pure, the FlashArray.
|SOAP||In the Days before REST API was more widely used, SOAP (Simple Object Access Protocol) was a messaging protocol that was used to exchange structured data (information) via web services (HTTP). SOAP uses an XML structure to exchange the information between source and destination. SOAP is heavily used in the management communication of the vSphere environment, vCenter Services and most important for the purpose of this KB, VASA.|
|This is the TCP/IP path between the compute management layer (vSphere) and the storage management layer (FlashArray). Requests such as creating, deleting and otherwise manage storage is issued on this path. This is done via HTTPS and TLS 1.2 over port 8084 for the FlashArray VASA Provider.|
|The Data Path is the established connection from the ESXi hosts to the Protocol Endpoint on the FlashArray. The Data Path is the flow that SCSI Ops are sent and received, just as any traditional SAN. This connection is established over the storage fabric. Today this means iSCSI or Fibre Channel.|
|SPBM||Storage Policy Based Management (SPBM) is a framework designed by VMware to provision and/or manage storage. Users can create policies of selected capabilities or tags and assign them to a VM or specific virtual disk. SPBM for internal storage is called vSAN, SPBM for external storage is called vVols. A vendor must support VASA to enable SPBM for their storage.|
|A VASA provider is an instance of the VASA service that a storage vendor offers a customer that is deployed in their environment. For the FlashArray the VASA Providers are built into the FlashArray controllers and will be represented as VASA-CT0 and VASA-CT1. The term Storage Provider is used in vCenter to represent the VASA Providers for a given FlashArray.|
|Virtual Volume (vVol)||Virtual Volumes (vVols) is the name for this full architecture. A specific vVol is any volume on the array that is in use by the vSphere environment and managed by the VASA provider. A vVol based volume is not fundamentally different than any other volume on the FlashArray. The main distinction is that when it is in use, it is attached as a Sub-LUN via a PE, instead of via a direct LUN.|
vVol Storage Container
|The vVol Datastore is not a LUN, file system or volume. A vVol Datastore is a target provisioning object that represents a FlashArray, a quota for capacity, and is a logical collection of config vVols. While the object created in vCenter is represented as a Datastore, the vVol Datastore is really a Storage Container that represents that given FlashArray.|
|SPS||This is a vCenter deamon called Storage Policy Service (SPS or vmware-sps). The SMS and SPBM services run as part of the Storage Policy Service.|
|SMS||This is a vCenter Service called Storage Management Service (SMS).|
|vvold||This is the service running in ESXi that handles the management requests directly from the ESXi host to the VASA provider as well as communicates with the vCenter SMS service to get the Storage Provider information.|
|Here is a generic high level view of the vVol Architecture. Make note that the Control/Management Path is separate from the Data Path.|
While the focus of the VASA Operation will have more to do with the Management Path, having an understanding of the architecture will help clarify how the performance of a VASA operations can be impacted.
What is a VASA Operation?
As the name implies, a VASA Operation is an operation issued to a VASA Provider that is managing some storage arrays. In the case of the FlashArray, a single array. These operations will be issued from either vCenter SPS or ESXi's vvold service. The request will be issued to the Active Storage Provider and is sent as a SOAP call to the VASA provider.
Once the VASA provider has received the request, the VASA Provider will determine if it's a request that can be satisfied within VASA (such as a GetEvents or GetAlarms request). Should the request require VASA to perform an action on the storage directly or needs to query something from the array, VASA will then forward the request to the FlashArray DB service in REST.
When the request is not a query/lookup that the DB service can process on it's own, but is performing operations to the storage, the request will then be forwarded to the FlashArray Core Services. Once that request has been completed by Purity a response is sent back to the DB service which then gets the success response back to VASA. At this point, the response is formatted back to SOAP and sent back to SPS or vvold.
Here is a quick table that outlines this flow (We are working to put a better illustration of this in the future):
|vCenter Server SPS||=>||FlashArray VASA Service||=>||FlashArray DB Service||=>||FlashArray Core Service|
|vCenter Server SPS||<=||FlashArray VASA Service||<=||FlashArray DB Service||<=||FlashArray Core Service|
|ESXi vvold Process||=>||FlashArray VASA Service||=>||FlashArray DB Service||=>||FlashArray Core Service|
|ESXi vvold Process||<=||FlashArray VASA service||<=||FlashArray DB service||<=||FlashArray Core Service|
Now that we have a better understanding about what a VASA Operation is here is an example of a VASA Op to connect a config and data vVol to the ESXi host:
- The ESXi host issues a bind request for a Config vVol and Data vVol to the VASA provider as a SOAP request
- The VASA Provider receives the SOAP request to Bind a Config vVol and Data vVol to the ESXi host
- VASA creates a REST request to issue to the FlashArray DB service
- The FlashArray DB service receives the Rest request and then forwards that request to Core Purity
- The Config vVol and Data vVol are connected to the ESXi host that sent the request in Purity
- Core Purity responds back to the FlashArray DB service with a success
- The FlashArray DB Service send back the successful response to the VASA Service
- VASA formats a SOAP response with the successful op and sends it back to the ESXi host that sent the request
- ESXi receives the successful SOAP response and the request is marked as completed
Depending on the factors that will be covered in the following section, this VASA op could take a few seconds or up to 30 or 60 seconds. Generally individual VASA ops will be completed between 3 and 9 seconds. Each operation can have varying amounts of work that VASA will need to forward to the DB or Core services. From as little as a GetAlarms call that will just query the VASA Provider and VASA will return a response in milliseconds. To a Failover Replication Group call that will include copying out the volumes from snapshots, updating metadata for those volumes, creating vgroups, etc.
Performance of a VASA Operation
Understanding how the performance of a VASA Op can be impacted will now be easier after gaining an understanding of the flow of a VASA Operation. The VASA Op has a few points here that could impact how long it takes to complete. The KB will be breaking it down into the 4 areas:
- vCenter SPS or ESXi vvold
- FlashArray VASA Service
- FlashArray DB Service
- FlashArray Core Service
SPS and vvold Services
While issues in SPS or the vvold service don't always directly impact the performance of a VASA Op, there are some factors to consider.
- Responsiveness of the vCenter Server's vmware-sps service or ESXi's vvold service
- If the Service is locked or resource strained, then it'll take longer for the service to send the request or to process the completed request response
- Additionally if these services are taking time to respond then there could be queuing of those requests on the vSphere side
- Management path disruption from the vCenter Ser'vers vmware-sps service or ESXi's vvold service
- In the event that there is a network issue or the service fails or locks up the requests issued may fail right away or take 60+ seconds to fail
FlashArray VASA Service
The VASA service pressure points on the FlashArray of concern would be how VASA is able to handle incoming requests, forward those requests to DB, receive the responses from DB and then send the responses back to vSphere.
- VASA processing incoming VASA Ops from the vSphere environment
- In the event that all of the hosts and vCenters connected to VASA are all sending as many VASA Ops as they can, the VASA service could be receiving between tens to thousands of requests at once. The more requests being issued to VASA at once will increase the amount of time that VASA is able to process those requests.
- VASA forwarding the Ops to the FlashArray DB Service
- There are differing levels to how complex a VASA op can be and depending on the operation, there can be multiple REST Requests that are forwarded to the Database service. For example, running an SPBM Replication Group Failover will forward many REST requests to the DB Service for just one VASA Op.
- VASA processing the response from the FlashArray DB Service
- There are instances where there are several responses from the DB service that VASA then needs to process and construct as a SOAP response to the issuer of the VASA Op.
- VASA sending the responses back to the vSphere environment
- Any impact here is generally related to the connectivity between VASA and the issuer, either network related or session authentication related.
FlashArray DB Service
With regards to the DB service the pressure points really come down to the overall load on the DB Service as well as the scope of the request to the DB service.
- Load on the FlashArray Database Service
- While this could be related to overall Load of the FlashArray, there are instances where there is a heavy API workload being issued to the FlashArray, which the Database service handles before forwarding the request to the FlashArray.
- Scope of the VASA request that the Database Service is handling
- All requests from VASA to the FlashArray Database service are not the same with regards to the scope of the call. Some VASA Ops are just a few API calls to the DB service, while others could be several hundreds of API calls
FlashArray Core Service
The FlashArray has a core service, called Purity OS that is running on the FlashArray. Part of this core service is to handle all incoming SAN requests as well as API requests. Factors that could impact Purity from handling these requests include, but are not limited to, are as follows:
- Overall high load on the FlashArray (Load Meter)
- At higher levels of overall load on the FlashArray, there will be less resources that VASA and the Database service will have to make requests to Purity. Additionally, Purity may take longer to process the requests issued from the DB Service.
- High front end IO workload and/or a High API workload
- Purity will always prioritize front end IO requests, so if there is a high amount of front end workload that is also causing high load on the array, then the Core Service is going to take longer to process the requests from the DB Service.
While the lifecycle of a VASA operation is fairly straightforward along the management path, there are some additional considerations that need to be given when reviewing the performance of the management path of the vVols ecosystem. However, simply looking at the load of the FlashArray, ESXi hosts or vCenter server doesn't quite cover all aspects. As there could be the volume of the API calls coming to the FlashArray, the amount of VASA requests coming from the vSphere environment, the FlashArray object scale and the overall scope of the VASA request being issued. There isn't a single point to try to troubleshoot or evaluate, rather the whole picture needs to be viewed and then each aspect can be inspected.
Here are some other KBs that cover topics from what's new in VASA 1.1.0 to understanding failure scenerios.