Skip to main content
Pure Technical Services

Linux Recommended Settings

Currently viewing public documentation. Please login to access the full scope of documentation.

KP_Ext_Announcement.png

To ensure the best performance with the Pure Storage FlashArray, please use this guide for the configuration and implementation of Linux hosts in your environment. These recommendations apply to the versions of Linux that we have certified as per our Compatibility Matrix.

Important Notice

Due to a change of functionality around path priority detection in the latest version of multipath-tools, all customers must add the statement detect_prio "no" into their multipath.conf. The default configuration otherwise will try to override the 'alua' prioritizer and replace it with 'sysfs', which will could cause problems during array firmware updates.

Queue Settings

We recommend two changes to the queue settings. The first selects the 'noop' I/O scheduler, which has been shown to get better performance with lower CPU overhead than the default schedulers (usually 'deadline' or 'cfq'). The second change eliminates the collection of entropy for the kernel random number generator, which has high CPU overhead when enabled for devices supporting high IOPS.

Manually Changing Queue Settings 

Not required unless LUNs are already in use with wrong settings.

These settings can be safely changed on a running system, by locating the Pure LUNs:

grep PURE /sys/block/sd*/device/vendor

And writing the desired values into sysfs files:

echo noop > /sys/block/sdx/queue/scheduler

An example for loop is shown here to quickly set all Pure luns to the desired 'noop' elevator:

for disk in $(lsscsi | grep PURE | awk '{print $6}'); do
    echo noop > /sys/block/${disk##/dev/}/queue/scheduler
done

All changes in this section take effect immediately, without rebooting for RHEL5 and higher. RHEL 4 releases will require a reboot. These changes will not persist unless they are added to the udev rule.

Notice, noop has [noop] to designate it as the desired scheduler.

[robm@robm-rhel7 ~]$ cat /sys/block/sdb/queue/scheduler
[noop] deadline cfq

Applying Queue Settings with udev

Once the IO scheduler elevator has been set to 'noop', it is often desired to keep the setting persistent, after reboots. 

Step 1: Create the Rules File

Create a new file in the following location (for each respective OS). The Linux OS will use the udev rules to set the elevators after each reboot.

RHEL
/etc/udev/rules.d/99-pure-storage.rules
Ubuntu  
/lib/udev/rules.d/99-pure-storage.rules

Step 2: Add the Following Entries to the Rules File  (Version Dependent)

The following entries automatically set the elevator to 'noop' each time the system is rebooted. Create a file that has the following entries, ensuring each entry exists on one line with no carriage returns:

Note that in RHEL 8.x ‘noop’ no longer exists and has been replaced by ‘none’.

RHEL 8.x and SuSE 15.2 and higher
# Recommended settings for Pure Storage FlashArray.
# Use none scheduler for high-performance solid-state storage for SCSI devices
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/scheduler}="none"
ACTION=="add|change", KERNEL=="dm-[0-9]*", SUBSYSTEM=="block", ENV{DM_NAME}=="3624a937*", ATTR{queue/scheduler}="none"

# Reduce CPU overhead due to entropy collection
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/add_random}="0"
ACTION=="add|change", KERNEL=="dm-[0-9]*", SUBSYSTEM=="block", ENV{DM_NAME}=="3624a937*", ATTR{queue/add_random}="0"

# Spread CPU load by redirecting completions to originating CPU
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/rq_affinity}="2"
ACTION=="add|change", KERNEL=="dm-[0-9]*", SUBSYSTEM=="block", ENV{DM_NAME}=="3624a937*", ATTR{queue/rq_affinity}="2"

# Set the HBA timeout to 60 seconds
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{device/timeout}="60"
RHEL 6.x, 7.x
# Recommended settings for Pure Storage FlashArray.

# Use none scheduler for high-performance solid-state storage
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/scheduler}="noop"

# Reduce CPU overhead due to entropy collection
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/add_random}="0"

# Spread CPU load by redirecting completions to originating CPU
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/rq_affinity}="2"

# Set the HBA timeout to 60 seconds
ACTION=="add", SUBSYSTEMS=="scsi", ATTRS{model}=="FlashArray      ", RUN+="/bin/sh -c 'echo 60 > /sys/$DEVPATH/device/timeout'"

Please note that 6 spaces are needed after "FlashArray" under "Set the HBA timeout to 60 seconds" above for the rule to take effect.

RHEL 5.x
# Recommended settings for Pure Storage FlashArray.
 
# Use noop scheduler for high-performance solid-state storage
ACTION=="add|change", KERNEL=="sd*[!0-9]|", SYSFS{vendor}=="PURE*", RUN+="/bin/sh -c 'echo noop > /sys/$devpath/queue/scheduler'" 

It is expected behavior that you only see the settings take effect for the sd* devices. The dm-* devices will not reflect the change directly but will inherit it from the sd* devices that make up its path. 

Regarding Large I/O Size Requests and Buffer Exhaustion

See KB FlashArray: Large I/O Size Requests and I/O Buffers

Maximum IO Size Settings

The maximum allowed size of an I/O request in kilobytes is determined by the max_sectors_kb setting in sysfs. This restricts the largest IO size that the OS will issue to a block device. The Pure Storage FlashArray can handle a maximum of 4MB writes. Therefore, we need to make sure that the maximum allowed IO size matches our expectations. You can check your current settings to determine what the IO size is, and as long as it does not exceed 4096, you should be fine.

In some cases, the Maximum IO Size Settings is not honored, and the host generates writes over the 4 MB max. If you see the following errors, the IO size might be the problem:

end_request: critical target error, dev dm-14, sector 158686242
Buffer I/O error on device dm-15, logical block 19835776
lost page write due to I/O error ondm-15

Though the Pure Storage FlashArray is designed to service IO with consistently low latency, there are error conditions that can cause much longer latencies and it is therefore important to ensure dependent servers and applications are tuned appropriately to ride out these error conditions without issue. By design, given the worst case, recoverable error condition, the FlashArray will take up to 60 seconds to service an individual IO.

You can do this with the following command:  

For versions below RHEL 6, you can add the following command(s) into rc.local:

echo 60 > /sys/block/<Dev_name>/device/timeout

The default timeout for normal file system commands is 60 seconds when udev is being used. If udev is not in use, the default timeout is 30 seconds. If you are running RHEL 6+, and want to ensure the rules persist, then use the udev method. 
 

Verify the Current Setting

If the value is  ≤ 4096, then no action is necessary. However, if this value is > 4096, we recommend that you change the max to 4096. 

Changing the Maximum Value 

Reboot Persistent

We recommend that you add the value to your UDEV rules file (99-pure-storage.rules) created above. This ensures that the setting persists through a reboot. To change that value please do the following: 

  1. Add this line to your 99-pure-storage.rules file:
    ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/max_sectors_kb}="4096"
    

    You can use this command to add it:

    echo 'ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/max_sectors_kb}="4096"' >> /etc/udev/rules.d/99-pure-storage.rules

     NOTE: The location of your rules file may be different depending on your OS version, so please double check the command before running it. 

  2. Reboot the host. 
  3. Check the value again.

Immediate Change but Won't Persist Through Reboot

This command should only be run if you are sure there are no running services depending on that volume, otherwise you can risk an application crash.

If you need to make the change immediately, but cannot wait for a maintenance window to reboot, you can also change the setting with the following command: 

echo  # > /sys/block/sdz/queue/max_sectors_kb

Substitute # with a number equal to or less than 4096 (default).

Recommended DM-Multipath Settings

Sample multipath.conf

The following multipath.conf file has been tested with recent versions of RHEL 8. It provides settings for volumes on FlashArray exposed via either SCSI or NVMe. Prior to use, verify the configuration with multipath -t. Some settings may be incompatible with older distributions; we list some known incompatibilities and workarounds below.

defaults {
        polling_interval       10
}


devices {
    device {
        vendor                      "NVME"
        product                     "Pure Storage FlashArray"
        path_selector               "queue-length 0"
        path_grouping_policy        group_by_prio
        prio                        ana
        failback                    immediate
        fast_io_fail_tmo            10
        user_friendly_names         no
        no_path_retry               0
        features                    0
        dev_loss_tmo                60
    }
    device {
        vendor                   "PURE"
        product                  "FlashArray"
        path_selector            "service-time 0"
        hardware_handler         "1 alua"
        path_grouping_policy     group_by_prio
        prio                     alua
        failback                 immediate
        path_checker             tur
        fast_io_fail_tmo         10
        user_friendly_names      no
        no_path_retry            0
        features                 0
        dev_loss_tmo             600
    }
}

Setting compatibility notes

  • Path selectors: as listed in the sample above, Pure recommends the use of queue-length 0 with NVMe and service-time 0 with SCSI, which improve performance in situations where paths have differing latencies by biasing I/Os towards paths that are servicing I/O more quickly. Older kernels (before RHEL 6.2/before SUSE 12) may not support these path selectors and should specify path_selector "round-robin 0" instead.
  • Path prioritizers (ALUA for SCSI, and ANA for NVMe) and failback immediate must be enabled on hosts connected to arrays configured in an ActiveCluster.
    • The ANA path prioritizer for NVMe is a relatively recent feature (RHEL 8), and older distributions may not support it. In non-ActiveCluster configurations, it can be safely disabled by removing the line prio ana and replacing path_grouping_policy group_by_prio with path_grouping_policy multibus.
  • Please note that fast_io_fail_tmo and dev_loss_tmo do not apply to iSCSI.
  • Please note that the above settings can differ based on a use case, for example - if user has RHEL Open Stack Cinder driver configured, the settings can differ, so please, before making recommendations, ask the customer if they have anything specific configured, or it is just a standard Linux host.
  • If multipath nodes are not showing up on the host after a rescan, you may need to add find_multipaths yes to the defaults section above. This is the case for some hosts which boot of a local non-multipath disk.
  • As per https://access.redhat.com/solutions/3234761, RHV-H multipath configuration must include user_friendly_names no.
  • As per https://access.redhat.com/site/solutions/110553, running DM-multipath along with EMC PowerPath is not a supported configuration and may result in kernel panics on the host.
  • Consult man 5 multipath.conf and/or the RHEL Documentation before making modifications to the configuration.

Verifying DM-Multipathd Configuration

After creating and connecting some volumes on the FlashArray to the host, run multipath -ll to check the configuration. The below output was obtained by creating two volumes and connecting the first to the host via NVMe, and the second through SCSI.

[root@init116-13 ~]# multipath -ll
eui.00292fd80c2afd4724a9373400011425 dm-4 NVME,Pure Storage FlashArray
size=2.0T features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=50 status=active
  |- 2:2:1:70693 nvme2n1 259:0 active ready running
  |- 3:1:1:70693 nvme3n1 259:1 active ready running
  |- 6:0:1:70693 nvme6n1 259:2 active ready running
  `- 4:3:1:70693 nvme4n1 259:3 active ready running
3624a9370292fd80c2afd473400011424 dm-3 PURE,FlashArray
size=1.0T features='0' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 6:0:2:1     sdd     8:48  active ready running
  |- 6:0:3:1     sdf     8:80  active ready running
  |- 7:0:3:1     sdl     8:176 active ready running
  `- 7:0:2:1     sdj     8:144 active ready running

Note the policy='queue-length 0' and policy='service-time 0' which indicate the active path selection policies. These should match the path selection policy settings from the configuration file.

To check if path prioritizers are working correctly in an ActiveCluster environment, create a stretched volume and set a preferred array for the host as described in ActiveCluster: Optimizing Host Performance with Array Preferences. The output of multipath -ll should then look similar to the following example.

# multipath -ll
3624a9370292fd80c2afd473400011426 dm-2 PURE,FlashArray
size=3.0T features='0' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 6:0:2:2     sde     8:64  active ready running
| |- 6:0:3:2     sdg     8:96  active ready running
| |- 7:0:2:2     sdk     8:160 active ready running
| `- 7:0:3:2     sdm     8:192 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  |- 6:0:0:1     sdb     8:16  active ready running
  |- 6:0:1:1     sdc     8:32  active ready running
  |- 7:0:0:1     sdh     8:112 active ready running
  `- 7:0:1:1     sdi     8:128 active ready running

Notice the two distinct groups of paths. The paths to the preferred array (SCSI target numbers 2 and 3) have priority 50, while the paths to the non-preferred array (SCSI target numbers 0 and 1) have priority 10.

Excluding Third-Party vendor LUNs from DM-Multipathd

There are no certification requirements for storage hardware systems with Oracle Linux KVM. Oracle Linux KVM uses kernel interfaces to communicate with storage hardware systems, and does not depend on an application programming interface (API).

When systems have co-existing multipathing software, it is a good practice to exclude control from one multipathing software in order to allow control by another multipathing software. 

The following is an example of using DM-Multipathd to blacklist LUNs from a third party vendor. The syntax blocks DM-Multipathd from controlling those luns that are "blacklisted".

The following can be added to the 'blacklist' section of the multipath.conf file.

blacklist {
                device {
                             vendor "XYZ.*"
                             product ".*"
                             }

                device {
                             vendor "ABC.*"
                             product ".*"
                             }
                 }

Space Reclamation

You will want to make sure that space reclamation is configured on your Linux Host so that you do not run out of space.  For more information please see this KB: Reclaiming Space on Linux.

ActiveCluster

Conditional recommendations for Linux server: 

  • If ActiveCluster does not influence multipaths.conf on Linux (because Linux hosts are not configured in ActiveCluster), the recommended multipaths configuration per RHEL version should be up-to-dated.
  • If AcitveCluster does influence multipaths.conf on Linux, there should be extra information and examples for Linux with ActiveCluster, which is separated from the existing recommended multipaths configuration per RHEL version.

Additional multipath settings are required for ActiveCluster. Please see ActiveCluster Requirements and Best Practices.

ActiveDR

SCSI Unit Attentions

The Linux kernel has been enhanced to enable userspace to respond to certain SCSI Unit Attention conditions received from SCSI devices via the udev event mechanism. The FlashArray using version 5.0 and later supports the following SCSI Unit Attentions:

Description ASC ASCQ
CAPACITY DATA HAS CHANGED 0x2A 0x09
ASYMMETRIC ACCESS STATE CHANGED 0x2A 0x06
REPORTED LUNS DATA HAS CHANGED 0x3F 0x0E

With these SCSI Unit Attentions, it is possible to have the Linux initiator auto-rescan on these storage configuration changes. The requirement for auto-rescan support in RHEL/Centos is the libstoragemgmt-udev package. On installing this package a udev rule is installed, 90-scsi-ua.rules. Uncomment the supported Unit Attentions and reload the udev service to pick up the new rules:

[root@host ~]# cat 90-scsi-ua.rules
#ACTION=="change", SUBSYSTEM=="scsi", ENV{SDEV_UA}=="INQUIRY_DATA_HAS_CHANGED", TEST=="rescan", ATTR{rescan}="x"
ACTION=="change", SUBSYSTEM=="scsi", ENV{SDEV_UA}=="CAPACITY_DATA_HAS_CHANGED", TEST=="rescan", ATTR{rescan}="x"
#ACTION=="change", SUBSYSTEM=="scsi", ENV{SDEV_UA}=="THIN_PROVISIONING_SOFT_THRESHOLD_REACHED", TEST=="rescan", ATTR{rescan}="x"
#ACTION=="change", SUBSYSTEM=="scsi", ENV{SDEV_UA}=="MODE_PARAMETERS_CHANGED", TEST=="rescan", ATTR{rescan}="x"
ACTION=="change", SUBSYSTEM=="scsi", ENV{SDEV_UA}=="REPORTED_LUNS_DATA_HAS_CHANGED", RUN+="scan-scsi-target $env{DEVPATH}"


The following udevadm command will cause the following will cause all of the rules in the rules.d directory to be triggered immediately. The customer needs to take extreme caution when running this command because it may crash the host or have other unintended consequences. We recommend the customer reboots when they have a change control windows if at all possible. 


[root@host ~]# udevadm control --reload-rules && udevadm trigger
How to disable FA Safemode

If you are using a LUN to boot from SAN, you need to ensure the changes in your configuration files are applied upon rebooting. This is done by rebuilding the initial ramdisk (initrd or initramfs) to include the proper kernel modules, files and configuration directives after the configuration changes have been made. As the procedure slightly varies depending on the host, we recommend that you refer to your vendor's documentation for the proper procedure.

lsinitrd /boot/initramfs-$(uname -r).img | grep dm

An example file that may be missing that could result in failure to boot:

...(kernel build)/kernel/drivers/md/dm-round-robin.ko

When rebuilding the initial ramdisk, you need to confirm that the necessary dependencies are in place before rebooting the host to avoid any errors during boot. Refer to your vendor's documentation for specific commands to confirm this information.