Configuring a Juniper switch for use with Pure FlashArray NVMe/RoCE
Overview
NVMe/RoCE is one of the transports that can be used to present a remote NVMe namespace to a host as if were a local device. The transport leverages RoCEv2 which uses IPv4 user datagram protocol (UDP).
NVMe/RoCE traffic can be passed between the initiator and target using standard ethernet and ip routing capabilities. RoCE requires a lossless network. In order to provide these capabilities the switch will need support industry standard congestion control mechanisms.
Network Topology - Single Hop
A single hop network topology is one where the initiator and targets are separated by a single switch hop. In this design there should be 2 switches and at least 1 port from each initiator connected to each switch and a port from each controller on the FlashArray connected to each switch as shown in the diagram.
In this design there should be two subnets with a VLAN on each switch. Those subnets should not be transported trunked on any connections between the two switches. The array and initiator should have ports configured in each subnet and those ports connected to the corresponding access ports on the switches.
Juniper Switch Requirements
The following are the minimum requirements for the Juniper Switch:
-
JunOS 17.4R1.16 or later
-
Support for 100/50/25G
-
Support for Jumbo Frames (minimum 9000 Bytes)
-
Support for the following QoS features:
-
Per-Priority Flow Control (PFC)
-
Data Center Bridging Extensions (DCBX)
-
QoS interface trust for DSCP and COS
-
8 queues per port
-
DSCP based classification and remarking
-
The following switch models have been validated by Pure Storage:
-
Juniper QFX5200-32C-32Q
Switch Configuration
The switch configuration consists of configuring global (switch wide) parameters, and interface parameters for the initiator (host server) and the target (FlashArray).
Global configuration
For each switch create a VLAN for the NVMe/RoCE traffic and configure PFC for queue 3 and DSCP to 26.
Switch A
{master:0}[edit] root@SwitchA# set vlans vlan986 vlan-id 986 {master:0}[edit] root@iscsi64-b-2# set class-of-service forwarding-classes class roce queue-num 3 no-loss pfc-priority 3 {master:0}[edit] root@iscsi64-b-2# set class-of-service congestion-notification-profile roce input dscp code-point 011010 pfc mru 4200 {master:0}[edit] root@iscsi64-b-2# set class-of-service classifiers dscp roce forwarding-class roce loss-priority low code-points 011010 {master:0}[edit] root@SwitchA# commit
Switch B
{master:0}[edit] root@SwitchB# set vlans vlan987 vlan-id 987 {master:0}[edit] root@iscsi64-b-2# set class-of-service forwarding-classes class roce queue-num 3 no-loss pfc-priority 3 {master:0}[edit] root@iscsi64-b-2# set class-of-service congestion-notification-profile roce input dscp code-point 011010 pfc mru 4200 {master:0}[edit] root@iscsi64-b-2# set class-of-service classifiers dscp roce forwarding-class roce loss-priority low code-points 011010 {master:0}[edit] root@SwitchB# commit
Configure Array Interfaces
For each of the Array interfaces on each switch you will need to set the mtu to 9000 and set the access vlan and qos features as shown.
SwitchA
{master:0}[edit] root@SwitchA# delete interfaces et-0/0/0:0 unit 0 family inet {master:0}[edit] root@SwitchA# set interfaces et-0/0/0:0 mtu 9000 {master:0}[edit] root@SwitchA# set interfaces et-0/0/0:0 ether-options no-flow-control {master:0}[edit] root@SwitchA# set interfaces et-0/0/0:0 unit 0 family ethernet-switching {master:0}[edit] root@SwitchA# set interfaces et-0/0/0:0 unit 0 family ethernet-switching interface-mode access {master:0}[edit] root@SwitchA# set interfaces et-0/0/0:0 unit 0 family ethernet-switching interface-mode access vlan members vlan986 {master:0}[edit] root@SwitchA# set class-of-service interfaces et-0/0/0:0 congestion-notification-profile roce {master:0}[edit] root@SwitchA# set class-of-service interfaces et-0/0/0:0 unit 0 classifiers dscp roce {master:0}[edit] root@SwitchA# commit
Switch B
{master:0}[edit] root@SwitchB# delete interfaces et-0/0/0:0 unit 0 family inet {master:0}[edit] root@SwitchB# set interfaces et-0/0/0:0 mtu 9000 {master:0}[edit] root@SwitchB# set interfaces et-0/0/0:0 ether-options no-flow-control {master:0}[edit] root@SwitchB# set interfaces et-0/0/0:0 unit 0 family ethernet-switching {master:0}[edit] root@SwitchB# set interfaces et-0/0/0:0 unit 0 family ethernet-switching interface-mode access {master:0}[edit] root@SwitchB# set interfaces et-0/0/0:0 unit 0 family ethernet-switching interface-mode access vlan members vlan987 {master:0}[edit] root@SwitchB# set class-of-service interfaces et-0/0/0:0 congestion-notification-profile roce {master:0}[edit] root@SwitchB# set class-of-service interfaces et-0/0/0:0 unit 0 classifiers dscp roce {master:0}[edit] root@SwitchB# commit
Configure Initiator Interfaces
For each of the host/initiator interfaces on each switch you will need to set the mtu to 9000 and set the access vlan and qos features as shown.
Switch A
{master:0}[edit] root@SwitchA# delete interfaces et-0/0/3:0 unit 0 family inet {master:0}[edit] root@SwitchA# set interfaces et-0/0/3:0 mtu 9000 {master:0}[edit] root@SwitchA# set interfaces et-0/0/3:0 ether-options no-flow-control {master:0}[edit] root@SwitchA# set interfaces et-0/0/3:0 unit 0 family ethernet-switching {master:0}[edit] root@SwitchA# set interfaces et-0/0/3:0 unit 0 family ethernet-switching interface-mode access {master:0}[edit] root@SwitchA# set interfaces et-0/0/3:0 unit 0 family ethernet-switching interface-mode access vlan members vlan986 {master:0}[edit] root@SwitchA# set class-of-service interfaces et-0/0/3:0 congestion-notification-profile roce {master:0}[edit] root@SwitchA# set class-of-service interfaces et-0/0/3:0 unit 0 classifiers dscp roce {master:0}[edit] root@SwitchA# commit
Switch B
master:0}[edit] root@SwitchB# delete interfaces et-0/0/3:0 unit 0 family inet {master:0}[edit] root@SwitchB# set interfaces et-0/0/3:0 mtu 9000 {master:0}[edit] root@SwitchB# set interfaces et-0/0/3:0 ether-options no-flow-control {master:0}[edit] root@SwitchB# set interfaces et-0/0/3:0 unit 0 family ethernet-switching {master:0}[edit] root@SwitchB# set interfaces et-0/0/3:0 unit 0 family ethernet-switching interface-mode access {master:0}[edit] root@SwitchB# set interfaces et-0/0/3:0 unit 0 family ethernet-switching interface-mode access vlan members vlan987 {master:0}[edit] root@SwitchB# set class-of-service interfaces et-0/0/3:0 congestion-notification-profile roce {master:0}[edit] root@SwitchB# set class-of-service interfaces et-0/0/3:0 unit 0 classifiers dscp roce {master:0}[edit] root@SwitchB# commit
QoS Validation
Once you have completed the setup of the FlashArray and the initiator and have an active NVMe/RoCE connection between the devices, you should be able to see traffic in Queue 3 on the switch ports.
Use the show interfaces <ifname> extensive command to verify that the traffic is being seen on queue 3.
root@SwitchA> show interfaces et-0/0/0:0 extensive Egress queues: 10 supported, 5 in use Queue counters: Queued packets Transmitted packets Dropped packets 0 0 0 0 3 0 372829514 0 4 0 0 0 7 0 0 0 8 0 0 0 Queue number: Mapped forwarding classes 0 best-effort 3 roce 4 no-loss 7 network-control 8 mcast Active alarms : None Active defects : None PCS statistics Seconds Bit errors 0 Errored blocks 0 Ethernet FEC Mode : FEC74 Ethernet FEC statistics Errors FEC Corrected Errors 0 FEC Uncorrected Errors 0 FEC Corrected Errors Rate 0 FEC Uncorrected Errors Rate 0 MAC statistics: Receive Transmit Total octets 8359523840 1249856972458 Total packets 111986021 372750165 Unicast packets 111981508 372750003 Broadcast packets 0 0 Multicast packets 4513 162 CRC/Align errors 0 0 FIFO errors 0 0 MAC control frames 0 0 MAC pause frames 0 0 Oversized frames 0 Jabber frames 0 Fragment frames 0 VLAN tagged frames 0 Code violations 0 MAC Priority Flow Control Statistics: Priority : 0 0 0 Priority : 1 0 0 Priority : 2 0 0 Priority : 3 1013702 0 Priority : 4 0 0 Priority : 5 0 0 Priority : 6 0 0 Priority : 7 0 0
Verify the QOS configuration with the command show coniguration class-of-service
root@SwitchA> show configuration class-of-service classifiers { dscp roce { forwarding-class roce { loss-priority low code-points 011010; } } } forwarding-classes { class roce queue-num 3 no-loss pfc-priority 3; } congestion-notification-profile { roce { input { dscp { code-point 011010 { pfc; mru 4200; } } } } } interfaces { et-0/0/0:0 { congestion-notification-profile roce; unit 0 { classifiers { dscp roce; } } } et-0/0/1:0 { congestion-notification-profile roce; unit 0 { classifiers { dscp roce; } } } et-0/0/3:0 { congestion-notification-profile roce; unit 0 { classifiers { dscp roce; } } } et-0/0/4:0 { congestion-notification-profile roce; unit 0 { classifiers { dscp roce; } } } }