Skip to main content
Pure Technical Services

Configuring an Arista switch for use with Pure FlashArray NVMe/RoCE

Currently viewing public documentation. Please login to access the full scope of documentation.

Overview

NVMe/RoCE is one of the transports that can be used to present a remote NVMe namespace to a host as if were a local device.  The transport leverages RoCEv2 which uses IPv4 user datagram protocol (UDP).

NVMe/RoCE traffic can be passed between the initiator and target using standard ethernet and ip routing capabilities. RoCE requires a lossless network. In order to provide these capabilities the switch will need support industry standard congestion control mechanisms.

Network Topology - Single Hop

A single hop network topology is one where the initiator and targets are separated by a single switch hop.  In this design there should be 2 switches and at least 1 port from each initiator connected to each switch and a port from each controller on the FlashArray connected to each switch as shown in the diagram.

Single Card Initiator NVMe Fabric Addressing 4 port configuration (2).jpegIn this design there should be two subnets with a VLAN on each switch.  Those subnets should not be transported trunked on any connections between the two switches.  The array and initiator should have ports configured in each subnet and those ports connected to the corresponding access ports on the switches.

Arista Switch Requirements

The following are the minimum requirements for the Arista Switch:

  • Extensible Operating System (EOS) version 4.20.5F or greater

  • Support for 100/50/25G

  • Support for Jumbo Frames (minimum 9000 Bytes)

  • Support for the following QoS features:

    • Per-Priority Flow Control (PFC)

    • Data Center Bridging Extensions (DCBX)

    • QoS interface trust for DSCP and COS

    • 8 queues per port

    • DSCP based classification and remarking

 

The following switch models have been validated by Pure Storage:

  • Arista 7060CX2-32S

Switch Configuration

The switch configuration consists of configuring global (switch wide) parameters, and interface parameters for the initiator (host server) and the target (FlashArray).

Global configuration

For each switch create a VLAN for the NVMe/RoCE traffic and create a trunk group in order to prune the traffic from any trunk links on the switch.

Switch A

switchA#config t
switchA(config)#vlan 986
switchA(config-vlan-986)#name NVMe-RoCE-VLAN
switchA(config-vlan-986)#trunk group NVMe
switchA(config-vlan-986)#end
switchA#wr

Switch B

switchB#config t
switchB(config)#vlan 987
switchB(config-vlan-987)#name NVMe-RoCE-VLAN
switchB(config-vlan-987)#trunk group NVMe
switchB(config-vlan-987)#end
switchB#wr

Configure Array Interfaces

For the Array interfaces you will need to set the mtu to 9000 (if required; Arista switches typically  default to a MTU of 9000) and set the access vlan, spanning-tree parameters, dcbx mode, and qos features as shown.

SwitchA

switchA#conf t
switchA(config)#interface e1/1-2
switchA(config-if-Et1/1-2)#description FlashArray Ports
switchA(config-if-Et1/1-2)#mtu 9000
switchA(config-if-Et1/1-2)#switchport access vlan 986
switchA(config-if-Et1/1-2)#spanning-tree portfast
switchA(config-if-Et1/1-2)#spanning-tree bpduguard
switchA(config-if-Et1/1-2)#dcbx mode ieee
switchA(config-if-Et1/1-2)#qos trust dscp
switchA(config-if-Et1/1-2)#priority-flow-control mode on
switchA(config-if-Et1/1-2)#priority-flow-control priority 3 no-drop
switchA(config-if-Et1/1-2)#end
switchA#wr

Switch B


switchB#conf t
switchB(config)#interface e1/1-2
switchB(config-if-Et1/1-2)#description Array Ports
switchB(config-if-Et1/1-2)#mtu 9000
switchB(config-if-Et1/1-2)#switchport access vlan 987
switchB(config-if-Et1/1-2)#spanning-tree portfast
switchB(config-if-Et1/1-2)#spanning-tree bpduguard
switchB(config-if-Et1/1-2)#dcbx mode ieee
switchB(config-if-Et1/1-2)#qos trust dscp
switchB(config-if-Et1/1-2)#priority-flow-control mode on
switchB(config-if-Et1/1-2)#priority-flow-control priority 3 no-drop
switchB(config-if-Et1/1-2)#end
switchB#wr

Configure Initiator Interfaces

For the Array interfaces you will need to set the mtu to 9000 (if required; Arista switches default to a MTU of 9000) and set the access vlan, spanning-tree parameters, dcbx mode, and qos features as shown.

Switch A 

switchA#conf t
switchA(config)#interface e1/3-4
switchA(config-if-Et1/3-4)#description Initiator Ports
switchA(config-if-Et1/3-4)#mtu 9000
switchA(config-if-Et1/3-4)#switchport access vlan 986
switchA(config-if-Et1/3-4)#spanning-tree portfast
switchA(config-if-Et1/3-4)#spanning-tree bpduguard
switchA(config-if-Et1/3-4)#dcbx mode ieee
switchA(config-if-Et1/3-4)#qos trust dscp
switchA(config-if-Et1/3-4)#priority-flow-control mode on
switchA(config-if-Et1/3-4)#priority-flow-control priority 3 no-drop
switchA(config-if-Et1/3-4)#end
switchA#wr

Switch B

switchB#conf t
switchB(config)#interface e1/3-4
switchB(config-if-Et1/3-4)#description Initiator Ports
switchB(config-if-Et1/3-4)#mtu 9000
switchB(config-if-Et1/3-4)#switchport access vlan 987
switchB(config-if-Et1/3-4)#spanning-tree portfast
switchB(config-if-Et1/3-4)#spanning-tree bpduguard
switchB(config-if-Et1/3-4)#dcbx mode ieee
switchB(config-if-Et1/3-4)#qos trust dscp
switchB(config-if-Et1/3-4)#priority-flow-control mode on
switchB(config-if-Et1/3-4)#priority-flow-control priority 3 no-drop
switchB(config-if-Et1/3-4)#end
switchB#wr

QoS Validation

Once you have completed the setup of the FlashArray and the initiator and have an active NVMe/RoCE connection between the devices, you should be able to see traffic in Unciast Queue 3 on the switch ports.  

Use the show interface <ifname> counter queue detail command to verify that the traffic is being seen on queue 3.

switchA(config)#show int e1/1 counter queue detail
Port    TxQ   Counter/pkts  Counter/bytes Drop/pkts       Drop/bytes
------- ----  ------------  ------------  ------------    ------------
Et1/1   UC0   0              0            0               0
Et1/1   UC1   0              0            0               0
Et1/1   UC2   0              0            0               0
Et1/1   UC3   413362         38855900     0               0 
Et1/1   UC4   0              0            0               0
Et1/1   UC5   0              0            0               0
Et1/1   UC6   0              0            0               0
Et1/1   UC7   0              0            0               0
Et1/1   UC8   12             1716         0               0 
Et1/1   MC0   0              0            0               0
Et1/1   MC1   12             1008         0               0
Et1/1   MC2   0              0            0               0
Et1/1   MC3   0              0            0               0
Et1/1   MC4   0              0            0               0
Et1/1   MC5   0              0            0               0
Et1/1   MC6   0              0            0               0
Et1/1   MC7   0              0            0               0

Verify that priority flow control is enabled on the interface with the show interface <ifname> priority-flow-control status command you should see E A W for the interface indicating that PFC is Enabled, Active, and Watchdog Enabled for PFC priority 3.

switchA#show int e1/1 priority-flow-control status
The hardware supports PFC on priorities 0 1 2 3 4 5 6 7
The PFC watchdog timeout is 0.0 second(s) (default)
The PFC watchdog recovery-time is 0.0 second(s) (auto) (default)
The PFC watchdog polling-interval is 0.0 second(s) (default)
The PFC watchdog action is errdisable
The PFC watchdog override action drop is false
The PFC watchdog non-disruptive priorities are not configured
The PFC watchdog port non-disruptive-only is false
Global PFC : Enabled


E: PFC Enabled, D: PFC Disabled, A: PFC Active, W: PFC Watchdog Enabled
Port       Status  Priorities  Action      Timeout  Recovery        Polling         Note                                                 
                                                  Interval/Mode   Config/Oper                                                          
---------------------------------------------------------------------------------------
Et1/1      E A W   3             -          -         - / -          - / -

When there is congestion in the network you can verify that PFC is working by using the show interface <ifname> priority-flow-control counters detail command to verify that pause frames are being sent to PFC3.  If the counters have not been reset on they host (Array/Initiator) or they switch, these values should match the values seen on the host.

switchA#show int e1/1 priority-flow-control counters detail
ort Rx            PFC0         PFC1         PFC2         PFC3         PFC4         PFC5         PFC6         PFC7
Et1/1                 0            0            0     1826880            0            0            0            0


Port Tx            PFC0         PFC1         PFC2         PFC3         PFC4         PFC5         PFC6         PFC7
Et1/1                 0            0            0         3352            0            0            0            0