Troubleshooting: After upgrading to Purity//FA 6.0 or 6.1 SRM is reporting vVol VMs are not replicated and queryReplicationGroup fails
What is the issue?
The VASA workflows provide the FlashArray VASA provider to return error results to vSphere indicating that the batch/amount of object IDs is too large and to use the default batch size. In Purity 6.0 and 6.1 Pure Storage implemented this to support queryReplicationGroup. An issue was found in the vSphere interpretation of that response in that queryReplicationGroup does not honor the maxBatchSize advertised by VASA nor will it retry after a TooMany Error Result.
What does this mean and how does this happen? The default maxBatchSize on the FlashArray VASA provider is 4. Which means if there are more than 4 source and target replication groups then queryReplicationGroup will fail after the TooMany result and will not retry.
This issue is fixed in Purity//FA 6.1.7 and higher.
How do you see it?
There are a few ways that the issue can be seen. The issue can be seen from SRM, the vCenter UI, the ESXi host logs and vCenter SPS logs.
In the SRM Server screen, you will see that target replication groups can not be found for source replication groups. This workflow depends on the results from queryReplicationGroup in order to continue the worfklow. This means that SRM protection groups will not be able to show the VMs in a replicated state.
Additionally, vCenter will be unable to show a list of replication groups when applying replication groups to VMs as part of applying SPBM storage policies.
With PowerCLI, any queryReplicationGroup will result in an SMS runtime failure.
PS /Users/Mac-Pro> Get-SpbmReplicationGroup Get-SpbmReplicationGroup: 7/2/2021 5:14:27 PM Get-SpbmReplicationGroup SMS runtime fault on server '/VIServer=purecloud\email@example.com:443/': Unknown server error. See the event log for details. Get-SpbmReplicationGroup: 7/2/2021 5:14:27 PM Get-SpbmReplicationGroup SMS runtime fault on server '/VIServer=purecloud\firstname.lastname@example.org:443/': Unknown server error. See the event log for details.
When looking at the logging in vCenter for the SPS log, the TooMany Error will be listed.
2021-07-02T22:18:25.818Z [pool-35-thread-10] ERROR opId=62ee8a49:594f com.vmware.vim.sms.provider.vasa.VasaProviderImpl - [queryReplicationGroup] Failed to queryReplicationGroup for provider 2195e46f-b8c3-48b6-a51a-c6122ef96dad com.vmware.vim.sms.fault.VasaServiceException: java.rmi.RemoteException: TooMany; nested exception is: com.vmware.vim.vasa._3_0.TooMany: TooMany at com.vmware.vim.sms.client.VasaClientImpl.queryReplicationGroup(VasaClientImpl.java:1423) at sun.reflect.GeneratedMethodAccessor1357.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.vmware.vim.sms.client.VasaClientMethodInvoker.invokeMethod(VasaClientMethodInvoker.java:48) at com.vmware.vim.sms.client.VasaClientMethodInvoker.invoke(VasaClientMethodInvoker.java:35) at com.vmware.vim.sms.client.VasaClientHandler.invoke(VasaClientHandler.java:27) at com.sun.proxy.$Proxy112.queryReplicationGroup(Unknown Source)
Workarounds or fix?
There are a couple ways to correct this. The main way to address this is to upgrade to Purity//FA 6.1.7 or higher. This issue is resolved in those versions.
Upgrading Purity is the recommended resolution. Otherwise a Pure Support case should be opened and Support can adjust the VASA provider to no longer try to return TooMany with unbound queryReplicationGroup requests.