Skip to main content
Pure Technical Services

How To: Setup NVMe-RDMA with VMware

Currently viewing public documentation. Please login to access the full scope of documentation.

KP_Ext_Announcement.png

With the initial release of vSphere 7.0, Pure Storage's HTML-5 vCenter plugin will not be fully integrated with NVMe-oF functionality. One of the integrations not yet available is setting up the vSphere environment automatically (like we do with iSCSI). This KB is meant to serve as a guide for how to manually set up your environment.

NOTE: This guide is specific to Pure Storage and vSphere setup; this guide will NOT include setting up the switched fabric. Please see SAN Configuration (NVMe/RoCE pages) for information on these steps.

Please review Pure's NVMe-oF Support Matrix for additional switch and host HBA considerations.

For information on configuring Fibre Channel, please refer to the following KB:

How To: Setup NVMe-FC with VMware
How To: Setup NVMe/TCP with VMware

Configuring the vSphere Environment

Configuration of RDMA NICs on ESXi

NVMe/RoCE requires a lossless fabric. For FlashArray, there are two options which are partially covered in this KB for ESXi. Please note that in both circumstances, FlashArray and ESXi configuration is required and a Pure support case needs to be opened to set tunables. PFC should be used if possible. Please note that either PFC or Global Pause should be enabled but not both:

Enabling PFC on ESXi Hosts 

Mellanox RDMA NICs

PFC is not enabled by default on the Mellanox adapters, and needs to be configured manually.

VMware covers this in their KB for nmlx5_core drivers; follow the section Enabling DSCP based PFC.

Verify The Mellanox Changes on the ESXi Host

To list the different associations on the ESXi host and understand what's configured from an RDMA perspective, run the command esxcli rdma device list. This will show you similar output assuming you have RDMA HBAs installed:

[root@init110-13:~] esxcli rdma device list
Name     Driver      State    MTU  Speed    Paired Uplink  Description
-------  ----------  ------  ----  -------  -------------  -----------
vmrdma0  nmlx5_rdma  Active  1024  25 Gbps  vmnic4         MT27800 Family  [ConnectX-5 PCIe 3.0]
vmrdma1  nmlx5_rdma  Active  1024  25 Gbps  vmnic5         MT27800 Family  [ConnectX-5 PCIe 3.0]
vmrdma2  nmlx5_rdma  Active  4096  25 Gbps  vmnic6         MT27800 Family  [ConnectX-5 PCIe 3.0]
vmrdma3  nmlx5_rdma  Active  4096  25 Gbps  vmnic7         MT27800 Family  [ConnectX-5 PCIe 3.0]

You can see that there are 4 ports installed on this host that can be used for RDMA. In our example, vmnic6 and vmnic7 are configured for use and vmnic4 and vmnic5 are not configured.

You can also see what driver is used by the ports- nmlx5_rdma. This is helpful for figuring out what module to check for configuration changes required. In our example, we want to check the nmlx5_rdma and nmlx5_core adapters because that is what VMware has us modify on the ESXi host.

To check the parameters required by VMware for RDMA for our two modules, we're going to run this command first against the core module on our host:

[root@init110-13:~] esxcli system module parameters list -m nmlx5_core | grep 'trust_state\|pfcrx\|pfctx'
pfcrx                  int            0x08   Priority based Flow Control policy on RX.
   Notes: Must be equal to pfctx.
pfctx                  int            0x08   Priority based Flow Control policy on TX.
   Notes: Must be equal to pfcrx.
trust_state            int            2      Port policy to calculate the switch priority and packet color based on incoming packet

The first column tells us the parameter we are looking at and the third column tells us the value currently set for that parameter. In this example, these three are all set correctly for our switched RDMA fabric.

Then we'll run this command on our host:

[root@init110-13:~] esxcli system module parameters list -m nmlx5_rdma | grep 'dscp_force'
dscp_force         int   26     DSCP value to force on outgoing RoCE traffic.

Same as before, the first and third columns are the ones we care about. In our example this value is set correctly for our switched RDMA fabric.

Mellanox Adapters Using nmlx4_core Drivers

If the ESXi hosts are utilizing Mellanox adapters with the nmlx4_core driver, first follow the steps for nmlx5_core from VMware's KB above, then you must enable RoCEv2 as the default operating mode is RoCEv1 for this particular driver.

In order to determine which driver your Mellanox adapters are using look under the Driver section when running the following command on the ESXi host(s):

[root@barney:~] esxcfg-nics -l |grep -E 'Name|Mellanox'
Name    PCI          Driver      Link Speed      Duplex MAC Address       MTU    Description
vmnic4  0000:42:00.0 nmlx5_core  Up   25000Mbps  Full   ec:0d:9a:82:5a:32 9000   Mellanox Technologies ConnectX-4 Lx EN NIC; 25GbE; dual-port SFP28; (MCX4121A-ACA)
vmnic5  0000:42:00.1 nmlx5_core  Up   25000Mbps  Full   ec:0d:9a:82:5a:33 9000   Mellanox Technologies ConnectX-4 Lx EN NIC; 25GbE; dual-port SFP28; (MCX4121A-ACA)

We can confirm in the above example that the Mellanox adapter is utilizing the nmlx5_core driver so additional changes wouldn't be required here. If yours reports nmlx4_core then follow the steps below.

The steps below are only applicable to the nmlx4_core driver. The below action is not required if the Mellanox adapter is using the nmlx5_core driver.

1. Run the following command via SSH on all the ESXi hosts that will be connected via NVMe-RDMA:

esxcli system module parameters set -p enable_rocev2=1 -m nmlx4_core

2. Reboot the host.

Once the host has been rebooted you are able to confirm the change has taken effect by running the following command:

[root@barney:~] esxcli system module parameters list --module=nmlx4_core |grep -E 'Name|enable_rocev2'
 Name                    Type  Value  Description
 enable_rocev2           int   1      Enable RoCEv2 mode for all devices

Enabling Global Pause Flow Control on ESXi hosts (Direct Connect Only)

Follow the Enabling Global Pause Flow Control section in VMware's KB.

Please open a case with Pure Support to enable Global Pause on the array. The default on the array is DSCP based PFC and action is required by support to enable Global Pause.

Creation of vSwitches, Port Groups, and vmkernel Ports

Once you have enabled the appropriate configuration on your physical RDMA NICs (if applicable), the next step is to configure the vSwitches, port groups, and vmkernel ports on the ESXi hosts. If you have set up iSCSI previously you will notice that it is a very similar setup and configuration process.

The below configuration is only one option (using standard vSwitches). If you would like to use a vSphere Distributed Switches (vDS) then that is also an acceptable configuration option as well.

The important points to consider when setting up your environment:

- At least 2 different port groups are required. (You may have up to 4).
- Each port group should have only 1 physical RDMA NIC port in the "Active adapters" section. The other adapter port(s) should be in the "Unused adapters" section.
- Ensure MTU is configured consistently between the vmkernel adapters and the virtual switch.

Example setup configuration for standard vSwitches:

1. Select the desired ESXi host, select the Configure tab, locate the Networking section, and select Virtual switches.

Once you are on the Virtual switches page, select Add Networking.

add-networking.jpg

2. Select VMkernel Network Adapter.

Screen Shot 2020-03-04 at 12.01.11 PM.png

3. Select New standard switch and input the desired MTU size.

Screen Shot 2020-03-04 at 12.01.28 PM.png

4. Add one of the desired Physical NIC ports to the Active Adapters leaving the Standby and Unused adapters empty.

Screen Shot 2020-03-04 at 12.01.40 PM.png

5. Input the desired name of your Port Group Network label and any other applicable settings. Ensure you leave all of the Enabled services unchecked.

Screen Shot 2020-03-04 at 12.02.01 PM.png

6. Input the desired IP address and subnet mask for the vmknic.

Screen Shot 2020-03-04 at 12.02.21 PM.png

7. Review all of the listed fields and ensure the configuration is correct, then Finish the setup.

Screen Shot 2020-03-04 at 12.02.30 PM.png

Step 8: Repeat steps 1 - 7 for the other vSwitch, port group, and vmkernel port.

A common question when configuring RDMA capable NICs (RNICs) for use with ESXi is, "Can I configure the RNIC with multiple vSphere Services (NVMe-RDMA, vMotion, Replication, etc) to take advantage of the higher performing network adapter?".

While it is possible to configure each RNIC will multiple services it is not recommended by VMware or Pure Storage. This is primarily due to the complexity required in order to ensure every service works as expected when this configuration is in use.

While it is not currently recommended this may change in the future as Pure Storage and VMware work together to investigate how this can be better supported.

Creation and Configuration of VMware Software NVMe-RDMA Storage Adapters

After each ESXi host is properly configured and set up with network connectivity the next step is to create the Software NVMe-RDMA Storage Adapters. 

It is important to point out that you will add at least two "Software NVMe over RDMA adapters" to each ESXi host. This is better outlined in Step #2 below.

1. Select the desired ESXi host, select the Configure tab, locate the "Storage" section, and select Storage Adapters.

Once you are on the Storage Adapters page, select Add Software Adapter.

Screen Shot 2020-03-04 at 1.17.21 PM.png

2. Select Add software NVMe over RDMA adapter and choose which RDMA device you want to add.

Screen Shot 2020-03-04 at 1.42.02 PM.png

You will repeat this process for all RDMA ports you plan on using.

If you have more than 2 RDMA adapters available and do not plan on using all of them, you can look at the following ESXi host CLI output to compare which physical ports are associated with each virtual RDMA adapter:

[root@barney:~] esxcli rdma device list
Name     Driver      State    MTU  Speed    Paired Uplink  Description
-------  ----------  ------  ----  -------  -------------  -----------
vmrdma0  nmlx5_rdma  Active  4096  25 Gbps  vmnic4         MT27630 Family  [ConnectX-4 LX]
vmrdma1  nmlx5_rdma  Active  4096  25 Gbps  vmnic5         MT27630 Family  [ConnectX-4 LX]

3. After you have added all applicable Software NVMe-RDMA Storage Adapters the next step is to configure the Controllers for every adapter.

You will select the applicable adapter you wish to configure, click Controllers and then select Add Controller.

Screen Shot 2020-03-04 at 2.01.34 PM.png

4. You can decide to Automatically discover controllers or Enter controller details manually. For simplicity, an automatic discovery is recommended unless otherwise directed by Pure Storage.

Screen Shot 2020-03-04 at 2.43.21 PM.png

You will repeat this process for ALL FlashArray IP addresses dedicated for NVMe-RDMA connectivity. 

5. For optimal performance with Pure backed devices, a claimrule needs to be added on the ESXi hosts that are going to be used with NVMe/RoCEv2. Please note that there are two different versions of this claimrule depending on the ESXi version.

  1. On ESXi 8.0 U1 and later, please run the following two commands. Please note no reboot is required:
    esxcli storage core claimrule add --rule 102 -t vendor --nvme-controller-model "Pure*" -P HPP -g "pss=LB-Latency,latency-eval-time=180000,sampling-ios-per-path=16"
    
    esxcli storage core claimrule load
  2. On ESXi 8.0 GA and earlier, please run the following two commands. (more information here) Please note no reboot is required:
    esxcli storage core claimrule add --rule 102 -t vendor -P HPP -V NVMe -M "Pure*" --config-string "pss=LB-Latency,latency-eval-time=180000"
    
    esxcli storage core claimrule load

Identifying the NVMe Qualified Name (NQN) of an ESXi Host

After you have configured your ESXi hosts the next step is recording the NVMe Qualified Name (NQN) of each ESXi host you plan on connecting to the FlashArray. 

[root@barney:~] esxcli nvme info get
   Host NQN: nqn.2014-08.com.vmware:nvme:barney

Until NVMe-oF is implemented into the vSphere API, this is the only known way to get this information. This KB will be updated with a PowerShell and Python example once the API becomes available.

 

Configuring the FlashArray Host Objects

Creation of Host and Host Group Objects

From the FlashArray perspective, there isn't going to be much of a difference from setting up iSCSI or FC. If there are questions for setting up hosts and host groups, then you can refer to the FlashArray Configuration KB as that should sufficiently guide you through this process.

As a complementary addition to that KB, below is an example of adding an NQN to a host object on the FlashArray:
Screen Shot 2020-03-05 at 9.04.50 AM.png