Manually disconnecting and reconnecting volume results in missing namespace
Symptom
After a volume is manually disconnected from an ESXi host and is then presented back, the namespace is not listed in the available devices of the VMware Software NVMe-RDMA adapter. If multiple volumes were disconnected and reconnected, only the first namespace added back to the ESXi host is missing.
When looking at the VMware ESXi host vmkernel logs, the following error is reported:
2020-03-11T19:18:53.405Z cpu1:2098017)NvmeDiscover: 1020: Failed to add namespace on controller nqn.2010-06.com.purestorage:flasharray.1f3d6733c48eadcb#vmhba65#192.168.11.10, 10 namespace(s) have already been added to this controller
Cause
Due to an unexpected way in which an Asynchronous Event Notification (AEN) is being received by ESXi, the NVMe-RDMA controller is unable to add the namespace back to the available devices.
Resolution
This has been resolved in ESXi 7.0 U1 and later
A workaround has been identified to resolve this issue. This will require disconnecting and re-connecting one FlashArray IP address at a time to force the ESXi host to re-discover the missing namespace. If multipathing has been implemented properly, there should be no disruption to services during these tasks.
Using the vCenter GUI:
The above animation is an example of removing and re-adding a single NVMe-RDMA connection. You will need to do this same process for the remaining connections.