SAN Guidelines for Maximizing Pure Performance
The SAN is a common component in many customer issues. This article provides a brief overview of our suggested guidelines to achieve the best possible performance; or, alternatively, remove your SAN from the list of variables to review when troubleshooting.
Summary of ideal standards:
Applies To: Fibre Channel, iSCSI
When configuring your SAN, it’s important to remember that the more hops you have, the more latency you will see. For best performance, the ideal topology is a “Flat Fabric” where the FlashArray is only one hop away from any applications being hosted on it. For iSCSI, we recommend that you do not add routing to your SAN.
To illustrate, we'll use an actual support case without names. In this example, we were seeing terrible performance from the test host. Latency varied wildly, with several hundred milliseconds of peak latency. Bandwidth never surpassed 100MB/sec.
Assumed topology by Pure Support and customer
Make sure you know the topology. Please consult with your switch vendor's documentation on how to confirm your topology.
Some examples of helpful switch tools:
show interface brief(look for e_ports) or
Many of our customers use a CPU chassis such as a Cisco UCS or a HP c7000. These systems commonly have a number of bladed servers that connect to an embedded switch over a copper bus. All but UCS use a type of "dumb" switch (no zoning) which connects to a core fabric switch (this is true for FC and iSCSI). UCS connects to an additional switch/bridge, a "Fabric Interconnect" and then to a core switch.
Each one of these steps increases oversubscription.
For example, a bladed chassis might have 16 discrete servers. Each of these servers connects to an internal HBA which connects to the embedded switch. This switch takes these 16 servers and performs a form of NAT, forwarding all of their traffic to a lesser number of ports; commonly 4, and as many as 16 ports. These will log into a core switch passing frames over to storage. The oversubscription rate can get quite high if you use a hypervisor for your discrete servers. Add Virtual Machines to each blade, let's say 4 VMs per blade, and what do we end up with?
4VMs X 16 blades = 64 initiators
64 initiators share sixteen 8Gb ports. Sixteen 8Gb ports are funneled into an embedded switch with eight 8Gb external ports. We now have 8 entrance points for 64 hosts to communicate with storage, backup, virtual devices, etc. On an 8Gb switch, this is eight hosts for each 8Gb port. For daily operations, this is usually fine, but if you have several high demand systems on this chassis; a database, development systems, this configuration can behave like a bottleneck. This is the driving force behind 16Gb Fibre Channel and the coming 32Gb standard.
In one support case, each chassis only had two iSCSI connections to the core switch, providing, in the real world use, substantially less than 20Gb of bandwidth for all 64 hosts.
This configuration is particularly devastating for iSCSI. From VMware's Best Practices [pg 14] (emphasis is mine):
For iSCSI and NFS, make sure that your network topology does not contain Ethernet bottlenecks, where multiple links are routed through fewer links, potentially resulting in oversubscription and dropped network packets. Any time a number of links transmitting near capacity are switched to a smaller number of links, such oversubscription is a possibility.
Recovering from these dropped network packets results in large performance degradation. In addition to time spent determining that data was dropped, the retransmission uses network bandwidth that could otherwise be used for new transactions.
VMware adds this additional tip:
Be aware that with software-initiated iSCSI and NFS the network protocol processing takes place on the host system, and thus these might require more CPU resources than other storage options.
The cumulative impact of additional CPU overhead is another factor when laying out your iSCSI network. In other words, err on the side of too much bandwidth instead of too little.
Applies to: Fibre Channel, iSCSI
Assuming that you have plenty of network ports, please do avail yourself of all of Pure's ports. You will need to make sure that you balance between maximizing connections to the Pure Storage FlashArray, and any host limitation you may have on the number of connections.
Why? Storage devices are often oversubscribed in today's SAN. By adding more physical paths, you help maintain oversubscription; you provide more pathways, more resiliency, more performance, mitigation of physical problems, and last but not least; you take better advantage of our CPU allocation.
How to Check?
Open our UI and click on SYSTEM -> Connections and you should see the below:
Note that at the bottom of the connections we list our own ports as "Target Ports" and show our connection speed. This is a nice way to easily verify if you connected to some rogue port fixed at a lower speed.
Here's the command syntax and output:
pureuser@myarray> pureport list --initiator Initiator WWN Initiator Portal Initiator IQN Target Target WWN Target Portal Target IQN - 172.28.109.37:52143 iqn.1998-01.com.vmware:vs-ucs3-b200-m2-17-4ccc62f8 CT0.ETH4 - 172.28.109.120:3260 iqn.2010-06.com.purestorage:flasharray.137a6af57d9<wbr/>4535c - 172.28.109.37:52147 iqn.1998-01.com.vmware:vs-ucs3-b200-m2-17-4ccc62f8 CT0.ETH5 - 172.28.109.121:3260 iqn.2010-06.com.purestorage:flasharray.137a6af57d9<wbr/>4535c - 172.28.109.37:52151 iqn.1998-01.com.vmware:vs-ucs3-b200-m2-17-4ccc62f8 CT0.ETH6 - 172.28.109.122:3260 iqn.2010-06.com.purestorage:flasharray.137a6af57d9<wbr/>4535c - 172.28.109.37:52155 iqn.1998-01.com.vmware:vs-ucs3-b200-m2-17-4ccc62f8 CT0.ETH7 - 172.28.109.123:3260 iqn.2010-06.com.purestorage:flasharray.137a6af57d9<wbr/>4535c - 172.28.109.37:52159 iqn.1998-01.com.vmware:vs-ucs3-b200-m2-17-4ccc62f8 CT1.ETH4 - 172.28.109.124:3260 iqn.2010-06.com.purestorage:flasharray.137a6af57d9<wbr/>4535c - 172.28.109.37:52163 iqn.1998-01.com.vmware:vs-ucs3-b200-m2-17-4ccc62f8 CT1.ETH5 - 172.28.109.125:3260 iqn.2010-06.com.purestorage:flasharray.137a6af57d9<wbr/>4535c - 172.28.109.37:52167 iqn.1998-01.com.vmware:vs-ucs3-b200-m2-17-4ccc62f8 CT1.ETH6 - 172.28.109.126:3260 iqn.2010-06.com.purestorage:flasharray.137a6af57d9<wbr/>4535c - 172.28.109.37:52171 iqn.1998-01.com.vmware:vs-ucs3-b200-m2-17-4ccc62f8 CT1.ETH7 - 172.28.109.127:3260 iqn.2010-06.com.purestorage:flasharray.137a6af57d9<wbr/>4535c - 172.28.109.38:52144 iqn.1998-01.com.vmware:vs-ucs3-b200-m2-17-4ccc62f8 CT0.ETH4 - 172.28.109.120:3260 iqn.2010-06.com.purestorage:flasharray.137a6af57d9<wbr/>4535c - 172.28.109.38:52148 iqn.1998-01.com.vmware:vs-ucs3-b200-m2-17-4ccc62f8 CT0.ETH5 - 172.28.109.121:3260 iqn.2010-06.com.purestorage:flasharray.137a6af57d9<wbr/>4535c - 172.28.109.38:52152 iqn.1998-01.com.vmware:vs-ucs3-b200-m2-17-4ccc62f8 CT0.ETH6 - 172.28.109.122:3260 iqn.2010-06.com.purestorage:flasharray.137a6af57d9<wbr/>4535c - 172.28.109.38:52156 iqn.1998-01.com.vmware:vs-ucs3-b200-m2-17-4ccc62f8 CT0.ETH7 - 172.28.109.123:3260 iqn.2010-06.com.purestorage:flasharray.137a6af57d9<wbr/>4535c - 172.28.109.38:52160 iqn.1998-01.com.vmware:vs-ucs3-b200-m2-17-4ccc62f8 CT1.ETH4 - 172.28.109.124:3260 iqn.2010-06.com.purestorage:flasharray.137a6af57d9<wbr/>4535c - 172.28.109.38:52164 iqn.1998-01.com.vmware:vs-ucs3-b200-m2-17-4ccc62f8 CT1.ETH5 - 172.28.109.125:3260 iqn.2010-06.com.purestorage:flasharray.137a6af57d9<wbr/>4535c - 172.28.109.38:52168 iqn.1998-01.com.vmware:vs-ucs3-b200-m2-17-4ccc62f8 CT1.ETH6 - 172.28.109.126:3260 iqn.2010-06.com.purestorage:flasharray.137a6af57d9<wbr/>4535c - 172.28.109.38:52172 iqn.1998-01.com.vmware:vs-ucs3-b200-m2-17-4ccc62f8 CT1.ETH7 - 172.28.109.127:3260 iqn.2010-06.com.purestorage:flasharray.137a6af57d9<wbr/>4535c
In this example, we are using an FA-450 that has 8 physical ports. Now, you don't necessarily have to zone/connect all 8 paths to each host like here; you can simply connect 4 ports (2 per controller) to each host. Alternate for other hosts. For example, if you have 20 hosts, 10 can use any 4 ports and the other 10 hosts can use the other ports.
Due to the inherent overhead in iSCSI, we recommend using all possible interfaces per host. You will also want to use at least 8 sessions per host. This happens by default when you attach a host to all 8 Pure ports. If you do not do this, or if your model of FlashArray has less than 8 ports, configure for multiple sessions per host.
Here's the main point: you can see the same Initiator IQN (and it would be Initiator WWN for Fibre Channel) across 8 ports on Pure, the "Target IQNs."
If you see a host, but it is not mapped to any target port, this means that someone configured a host, manually entered an IQN or WWN, but that host has not logged into Pure. In other words, that host is offline for some reason.
A surprisingly large number of performance cases have been resolved by replacing cables. Touching the ends of fibre optic cables, or letting them dangle in a rack (we all do it) leads to contamination. The SFP+ connections for iSCSI are *not* immune to this and are just as unforgiving.
You can clean the cable tips. But most seem to just replace the entire cable. Physical layer errors are insidious, they can destroy performance during peak loads, are often overlooked, and yet is the easiest fix for any performance problem.
The best way is to log into your switch and check for physical layer errors. For 10Gb switch vendors the reporting here varies; some report very little. But almost all 10Gb switch vendors report CRCs. Any port with physical layer errors (like CRCs) should have the cable cleaned or replaced. If that doesn't work, test/replace your switch SFPs. Avoid patch panels if at all possible, if not possible be sure to bypass it for testing.
For Fibre Channel, Cisco has limited diagnostics for physical layer errors. Brocade reports every error in two ways; per port (portstatsshow like Cisco's "show interface details") and in a full Excel like table named "porterrshow."
Porterrshow is a powerful troubleshooting tool only available through Brocade's CLI:
porterrshow : frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err 0: 51.4m 3.8m 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1: 173.0m 89.5m 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Between the columns "frames rx" and "enc out" are your physical layer errors. In the above example, the paths are pristine.
- For Cisco and Brocade, these numbers are only as good as the switch uptime *or* since stats were last cleared.
- An older version of Cisco NX-OS didn't report any user command evidence of stats being cleared, but now does provide this under "show interface details" in newer versions.
We do collect physical layer errors on our ports and these are pulled from system statistics in hex format. Not very user-friendly to be sure and we're working to roll this data out so that you can review port errors in our UI. In the meantime, be assured that the array's physical layer errors are one of the first items we review for performance escalations and we're happy to provide these stats anytime you need it (for no additional charge, we'll also convert the data from hex to decimal).
Port Connection Speeds
Applies to: Fibre Channel, iSCSI
Often customers aren't aware of the rogue switch port that was hard fixed to 4Gb; or that the company's mission-critical database server, which cannot suffer any downtime, is still using 4Gb HBAs with outdated drivers. We have caught the occasional 2Gb host as well.
The item to bear in mind is that an SSD based array, with 8Gb or 16Gb HBAs, using Fibre Channel, can achieve extraordinary bandwidth with sub-millisecond latency. When you zone this to a 4Gb host there's a good chance that the host will cause back pressure, unable to keep up, forcing fame discards further down the path.
The best way to avoid performance problems with slower hosts or switches would be to:
- Add more physical paths if possible and avail yourself multipathing, as per your hosts best practices.
- In the event of having to use hardware 2 full port speeds from Pure (2Gb in a 8Gb SAN or 4Gb in a 16Gb SAN), you may need to fix the port speeds that Pure connects to one speed down.
Here's an example:
Let's say you just bought an FA-450 with 16Gb HBAs to put into your brand new 16Gb SAN (awesome!). The hosts you are using to migrate data over to Pure are 4Gb. The switch is now in a position to manage 16Gb speeds for Pure, but 4Gb speeds for your host. This can lead to enormous backpressure causing frame discards to Pure, and potentially to all connected devices! (Each frame discard can trigger up to 12 seconds of paused IO between devices for error recovery between ra_tov and ed_tov fabric values).
Therefore, you may need to down-clock port speeds at the switch where Pure connects to 8Gb. This should sufficiently stop discards and allow you to, at least, push IO at the host speed.
To check port speeds for Pure:
pureuser@problemchild> purehw list Name Status Identify Slot Index Speed Temperature Details CT0 ok off - 0 - - CT0.ETH0 ok - - 0 1.00 Gb/s - CT0.ETH1 ok - - 1 1.00 Gb/s - ... CT0.ETH8 ok - 5 8 10.00 Gb/s - CT0.ETH9 ok - 5 9 10.00 Gb/s - ... CT0.FC0 ok - 6 0 8.00 Gb/s - CT0.FC1 ok - 6 1 0.00 b/s - CT0.FC2 ok - 7 2 0.00 b/s - CT0.FC3 ok - 7 3 8.00 Gb/s -
Click on SYSTEM, Host Connections, and then look at the bottom of the main page. These are the speeds we've established with the switch, as well as our WWNs and IQNs which you can copy and paste if need be.
The best way to check port speeds for your switch is with the following (iSCSI left out due to the abundance of vendors):
Brocade - switchshow
Switchshow is a terrific command, displaying a list of all connected devices; the port speeds, port status, physical location, Fibre Channel address, etc.
switchshow: ... switchBeacon: OFF Index Slot Port Address Media Speed State Proto ==================================================<wbr/>= 0 1 0 6a0000 id N4 Online FC F-Port 50:06:0e:80:10:1a:dd:e4 1 1 1 6a0100 id N4 No_Light FC 2 1 2 6a0200 id N4 Online FC F-Port 52:4a:93:7e:27:89:c5:00 ... 137 1 25 6a8900 id N1 Online FC F-Port 50:06:0b:00:00:07:f2:f0 138 1 26 6a8a00 id N4 Online FC F-Port 21:00:00:24:ff:0d:0f:ab 139 1 27 6a8b00 id N2 Online FC F-Port 50:06:0b:00:00:39:6a:8e 140 1 28 6a8c00 id N2 Online FC F-Port 50:06:0b:00:00:39:6a:8c 141 1 29 6a8d00 id N4 Online FC F-Port 21:00:00:24:ff:02:5b:6f 142 1 30 6a8e00 id N4 Online FC F-Port 21:00:00:e0:8b:85:96:1e 143 1 31 6a8f00 id N4 Online FC F-Port 21:00:00:24:ff:0d:0e:03 16 2 0 6a1000 id N4 Online FC F-Port 50:06:0e:80:10:1a:dd:e5
Much of the output has been snipped, but take a look at the Speed column. The "N" before the number means that the port is set to auto-negotiate, and the following number is the speed that the device settled on.
In this example, Pure is on port 2 (we always start with a WWN of 52) and it is set to N4. So this is likely a 4Gb switch (as we are only 8 or 16Gb). Glance downward and notice the various port speeds. This customer hopefully does not intend to use the 1Gb device, as it is two generations behind 4Gb. Without using all eight FC ports on Pure, we would expect this customer to be bandwidth limited.
Cisco - show int brief
`show interface brief` --------------------------------------------------<wbr/>----------------------------- Interface Vsan Admin Admin Status SFP Oper Oper Port Mode Trunk Mode Speed Channel Mode (Gbps) --------------------------------------------------<wbr/>----------------------------- fc1/29 11 auto on up swl F 8 -- fc1/30 11 auto on up swl F 8 -- fc1/31 11 auto on up swl F 8 131 fc1/32 11 auto on up swl F 8 131
Cisco reports the port speeds, but you'll have to make a note separately as to what connects to what interface (use "show flogi database" to know which WWN is connected to which Interface). In this example, all devices are connected at 8Gb.
Applies to: Fibre Channel
Zone any single initiator to as many Pure ports as you like (for a dual fabric environment, use 4 ports through each fabric to each host port WWN).
Back in the day FC switch vendors recommended 1 host port to 1 storage port per zone. This was when a RSCN was sent to all devices and when a large switch was 32 ports. We don't recommend this, unless you have the desire and time to manage 4 to 8 zones per device.
Indeed, Brocade and Cisco no longer suggest 1 to 1 zoning:
Brocade: (takes you to a pdf of best practices, below quote is taken from page 11).
- Use single initiator single target or single initiator and multiple target zone sets. In a large fabric, zoning by single HBA requires the creation of possibly hundreds of zones; however, each zone contains only a few members. Zone changes affect the smallest possible number of devices, minimizing the impact of an incorrect zone change. This zoning philosophy is the preferred method and avoids RSCN performance concerns with multiple initiators in the same zone.
Cisco: (From MDS Configuration Guide)
The following guidelines must be considered when creating zone members:
- Configuring the same initiator to multiple targets is accepted.
- Configuring multiple initiators to multiple targets is not recommended.
Applies to: iSCSI
pureuser@mv-sup-fa420> purenetwork setattr ct0.eth<wbr/>2 --mtu 9000 Name Status Address Mask Gateway MTU MAC Speed Services Slaves ct0.eth2 enabled 9000 74:86:7a:d4:e5:1a 1.00 Gb/s iscsi -