Skip to main content
Pure1 Support Portal

Oracle Recommended Settings for FlashArray

Please be aware of the following issue when configuring Oracle: Oracle ASM Potential Issue That Can Cause Crashes

The principle difference between configuring database storage on a Pure Storage FlashArray instead of spinning disks is that virtually all of your architecture choices are centered on manageability, not performance. Specifically, none of the following factors are relevant on a Pure Storage array:

  • Stripe width and depth
  • RAID level (mirroring)
  • Intelligent Data Placement (short stroking)
  • O/S and database block size
  • ASM vs. FileSystem

Striping refers to distributing files across multiple hard drives to enable parallel access and to maximize IOPS. A Pure Storage array consists of 22 solid state disks per shelf, and the Purity Operating Environment automatically distributes data across all drives in the array using an algorithm designed to optimize performance and provide redundancy. In other words, the striping is automatic.

The Pure Storage redundancy technology is called RAID-HA, and it is designed specifically to protect against the three failure modes specific to flash storage: device failure, bit errors, and performance variability. You don't need another form of RAID protection, so you don’t need to compromise capacity or performance for data protection. RAID is automatic.

Just as striping and mirroring are irrelevant on a Pure Storage array, so is block size. Pure Storage is based on a fine-grained 512-byte geometry, so there are no block alignment issues as you might encounter in arrays designed with, for example, a 4KB geometry. Another benefit is a substantially higher deduplication rate than seen on other arrays offering data reduction.

Other flash vendors have designed their solutions on the new Advanced Format (AF) Technology, which allows for 4KB physical sector sizes instead of the traditional 512B sector size. But since solid-state disks don’t have sectors or cylinders or spindles, Pure Storage designed the Purity Operating Environment from the ground up to take advantage of flash’s unique capabilities. So, users gain flash performance without being shackled to any of the constraints of the disk paradigm.

In this document, we provide information to help you optimize the Pure Storage FlashArray for your Oracle database workload. Please note that these are general guidelines that are appropriate for many workloads, but as with all guidelines, you should verify that they are appropriate for your specific environment.

Operating System Recommendations

Pure Storage's operating system recommendations apply to all deployments: databases, VDI, etc. These recommendations apply whether you are using Oracle Automatic Storage Management (ASM), raw devices or a file system for your database storage. 

Queue Settings

We recommend two changes to the queue settings.  The first selects the 'noop' I/O scheduler, which has been shown to get better performance with lower CPU overhead than the default schedulers (usually 'deadline' or 'cfq').  The second change eliminates the collection of entropy for the kernel random number generator, which has high cpu overhead when enabled for devices supporting high IOPS.

Manually Changing Queue Settings 

(not required unless LUNs are already in use with wrong settings)

These settings can be safely changed on a running system, by locating the Pure LUNs:

grep PURE /sys/block/sd*/device/vendor

And writing the desired values into sysfs files:

echo noop > /sys/block/sdx/queue/scheduler

An example for loop is shown here to quickly set all Pure luns to the desired 'noop' elevator.

for disk in $(lsscsi | grep PURE | awk '{print $6}'); do
    echo noop > /sys/block/${disk##/dev/}/queue/scheduler
done

All changes in this section take effect immediately, without rebooting for RHEL5 and 6. RHEL 4 releases will require a reboot.

Applying Queue Settings with udev

Once the IO scheduler elevator has been set to 'noop' it is often desired to keep the setting persistent, after reboots. 

Step 1: Create the Rules File

Create a new file in the following location (for each respective OS). The Linux OS will use the udev rules to set the elevators after each reboot.

RHEL:
/etc/udev/rules.d/99-pure-storage.rules
Ubuntu:  
/lib/udev/rules.d/99-pure-storage.rules

Step 2: Add the Following Entries to the Rules File  (Version Dependent)

The following entries automatically sets the elevator to 'noop' each time the system is rebooted. Create a file that has the following entries, ensuring each entry exists on one line with no carriage returns:

For RHEL 6.x, 7.x and SuSE
# Recommended settings for Pure Storage FlashArray.

# Use noop scheduler for high-performance solid-state storage
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/scheduler}="noop"

# Reduce CPU overhead due to entropy collection
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/add_random}="0"

# Spread CPU load by redirecting completions to originating CPU
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/rq_affinity}="2"

# Set the HBA timeout to 60 seconds
ACTION=="add", SUBSYSTEMS=="scsi", ATTRS{model}=="FlashArray      ", RUN+="/bin/sh -c 'echo 60 > /sys/$DEVPATH/device/timeout'"

Please note that 6 spaces are needed after "FlashArray" under "Set the HBA timeout to 60 seconds" above for the rule to take effect.

For RHEL 5.x
# Recommended settings for Pure Storage FlashArray.
 
# Use noop scheduler for high-performance solid-state storage
ACTION=="add|change", KERNEL=="sd*[!0-9]|", SYSFS{vendor}=="PURE*", RUN+="/bin/sh -c 'echo noop > /sys/$devpath/queue/scheduler'" 

It is expected behavior that you only see the settings take effect for the sd* devices.  The dm-* devices will not reflect the change directly but will inherit it from the sd* devices that make up it's path. 

Recommended Multipath Settings 

ActiveCluster: Additional multipath settings are required for ActiveCluster. Please see ActiveCluster Requirements and Best Practices.

The Multipath Policy defines how the host distributes IOs across the available paths to the storage. The Round Robin (RR) policy distributes IOs evenly across all Active/Optimized paths. A newer MPIO policy, queue-length, is similar to round robin in that IOs are distributed across all available Active/Optimized paths, however it provides some additional benefits. The queue-length path selector will bias IOs towards paths that are servicing IO quicker (paths with shorter queues). In the event that one path becomes intermittently disruptive or is experiencing higher latency, queue-length will prevent the utilization of that path reducing the effect of the problem path.

The following are recommended entries to existing multipath.conf files (/etc/multipath.conf) for Linux OSes.  Add the following to existing section for controlling Pure devices.

Please note that fast_io_fail_tmo and dev_loss_tmo do not apply to iSCSI.

RHEL 7.3+
No manual changes required. The dm-multipath config shown below for PURE is default with the device-mapper version included in RHEL / Oracle Linux 7.3+
  device {
        vendor "PURE"
        product "FlashArray"
        path_grouping_policy "multibus"
        path_selector "queue-length 0"
        path_checker "tur"
        features "0"
        hardware_handler "0"
        prio "const"
        failback immediate
        fast_io_fail_tmo 10
        dev_loss_tmo 60
        user_friendly_names no
    }
}

Included in RHEL 7.3+ is device-mapper-multipath-0.4.9-99
Support added for PURE FlashArray - With this release, multipath has added built-in configuration support for the PURE FlashArray (BZ#1300415)

Supporting Info:
RHEL 6.2+, SLES 12, and supporting kernels
defaults {
   polling_interval      10
   find_multipaths       yes
}
devices {
   device {
       vendor                "PURE"
       path_selector         "queue-length 0"
       path_grouping_policy  group_by_prio
       path_checker          tur
       fast_io_fail_tmo      10
       dev_loss_tmo          60
       no_path_retry         0
       hardware_handler      "1 alua"
       prio                  alua
       failback              immediate
   }
}
RHEL 5.7+ - 6.1 and supporting kernels
defaults {
    polling_interval      10
}
 
devices {
    device {
        vendor                "PURE"
        path_selector         "round-robin 0"
        path_grouping_policy  multibus
        rr_min_io             1
        path_checker          tur
        fast_io_fail_tmo      10
        dev_loss_tmo          60
        no_path_retry         0
    }
}
RHEL 5.6 and below, and supporting kernels
defaults {
polling_interval 10
}

devices {

        device {
               vendor                "PURE"
               path_selector         "round-robin 0"
               path_grouping_policy  multibus
               rr_min_io             1
               path_checker          tur
               no_path_retry         0
               }
        }
Oracle VM Server
device {
                vendor                "PURE"
                product               "FlashArray"
                path_selector         "queue-length 0"
                path_grouping_policy  group_by_prio
                path_checker          tur
                fast_io_fail_tmo      10
                dev_loss_tmo          60
                no_path_retry         0
                hardware_handler      "1 alua"
                prio                  alua
                failback              immediate
                user_friendly_names   no
        }

More information on multipath settings can be found here: RHEL Documentation

Verifying the Settings

You can check the setup by looking at "multipath -ll".  

6.2+ (queue-length)

# multipath -ll

Correct Configuration:
mpathe (3624a93709d5c252c73214d5c00011014) dm-2 PURE,FlashArray
size=100G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  |- 1:0:0:4  sdd  8:48   active ready running
  |- 1:0:1:4  sdp  8:240  active ready running
  |- 1:0:2:4  sdab 65:176 active ready running
  |- 1:0:3:4  sdan 66:112 active ready running
  |- 2:0:0:4  sdaz 67:48  active ready running
  |- 2:0:1:4  sdbl 67:240 active ready running
  |- 2:0:2:4  sdbx 68:176 active ready running
  `- 2:0:3:4  sdcj 69:112 active ready running
...
 
Incorrect Configuration (check for unecessary spaces in multipath.conf):

3624a9370f35b420ae1982ae200012080 dm-0 PURE,FlashArray
size=500G features='0' hwhandler='0' wp=rw
 |-+- policy='round-robin 0' prio=0 status=active
 | `- 2:0:0:3 sdc 8:32 active undef running
 |-+- policy='round-robin 0' prio=0 status=enabled
 | `- 3:0:0:3 sdg 8:96 active undef running
 |-+- policy='round-robin 0' prio=0 status=enabled
 | `- 1:0:0:3 sdaa 65:160 active undef running
 `-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:3 sdak 66:64 active undef running
 ...

Below 6.2 (Round Robin)

    # multipath -ll
    ...
    Correct Configuration:
    3624a9370f35b420ae1982ae200012080 dm-0 PURE,FlashArray
    size=500G features='0' hwhandler='0' wp=rw
    `-+- policy='round-robin 0' prio=0 status=active
    |- 2:0:0:3 sdc 8:32 active undef running
    |- 3:0:0:3 sdg 8:96 active undef running
    |- 1:0:0:3 sdaa 65:160 active undef running
    `- 0:0:0:3 sdak 66:64 active undef running
     
    ...
    Incorrect Configuration (check for unecessary spaces in multipath.conf):

    3624a9370f35b420ae1982ae200012080 dm-0 PURE,FlashArray
    size=500G features='0' hwhandler='0' wp=rw
    |-+- policy='round-robin 0' prio=0 status=active
    | `- 2:0:0:3 sdc 8:32 active undef running
    |-+- policy='round-robin 0' prio=0 status=enabled
    | `- 3:0:0:3 sdg 8:96 active undef running
    |-+- policy='round-robin 0' prio=0 status=enabled
    | `- 1:0:0:3 sdaa 65:160 active undef running
    `-+- policy='round-robin 0' prio=0 status=enabled
    `- 0:0:0:3 sdak 66:64 active undef running
    ...

Excluding Third-Party vendor LUNs from DM-Multipathd

When systems have co-existing multipathing software, it is often desired to exclude control from one multipathing software in order to allow control by another multipathing software. 

The following is an example on using DM-Multipathd to blacklist LUNs from a third party vendor. The syntax blocks DM-Multipathd from controlling those luns that are "blacklisted".

The following can be added to the 'blacklist' section of the multipath.conf file.

blacklist {
                device {
                             vendor "XYZ.*"
                             product ".*"
                             }

                device {
                             vendor "ABC.*"
                             product ".*"
                             }
                 }

HBA I/O Timeout Settings

Though the Pure Storage FlashArray is designed to service IO with consistently low latency, there are error conditions that can cause much longer latencies and it is important to ensure dependent servers and applications are tuned appropriately to ride out these error conditions without issue. By design, given the worst case, recoverable error condition, the FlashArray will take up to 60 seconds to service an individual IO.

Edit /etc/system and either add or modify (if not present) the sd setting as follows:

set sd:sd_io_time = 0x3c
set ssd:ssd_io_time=0x3C

Note: 0x3c is hexadecimal for 60.

Process Prioritization and Pinning

The log writer ( ora_lgwr_{ORACLE_SID} ) is often a bottleneck for extremely heavy OLTP workloads since it is a single process and it must persist every transaction. A typical AWR Top 5 Timed Foreground Events report might look like the following:

original.png

If your AWR report shows high waits on LOG_FILE_SYNC or LOG_FILE_PARALLEL_WRITE, you can consider making these adjustments. However, do not do so unless your system has eight or more cores.

To increase log writer process priority:

  • Use renice
  • E.G. if log writer process id is 27250: renice –n 20 27250
  • Probably not advisable if your system has fewer than eight cores

To pin log writer to a given core:

  • Use taskset
  • E.G. if log writer process id is 27250: taskset –p 1 27250
  • Probably not advisable if you have fewer than 12 cores

We have found that the Pure Storage FlashArray can sustain redo log write rates over 100MB/s.

Provisioning Storage on the Pure Storage FlashArray

You can provision storage from either the Pure Storage GUI management tool or the command line, as illustrated here.

  1. Create a volume using either the GUI or CLI
    original (1).png

    Command line equivalent:
    purevol create -size 250G oravol10
  2. Connect the volume to the host.
    original (2).png

    Command line equivalent:
    purevol connect --host warthog oradata10
  3. Scan the new volume on the database server (as root):
    # rescan-scsi-bus.sh -i –r
  4. Flush any unused multipath device maps (as root):
    # multipath –F
  5. Detect and map the new volume with multipath (as root):
    # multipath –v2

Note the new volume’s unique device identifier (UUID), which is the same as the serial number seen in the Details section of the GUI. In this case it’s 3624a9370bb22e766dd8579430001001a

original (3).png

At this point, the newly provisioned storage is ready to use. You can either create a file system on it, or use it as an Automatic Storage Management (ASM) disk.

ASM Versus File System

On a Pure Storage FlashArray, there is no significant performance benefit to using ASM over a traditional file system, so the decision can be driven by your operational policies and guidelines. Whichever storage mechanism you choose performs well. From a DBA’s perspective, ASM does offer additional flexibility not found with file systems, such as the ability to move ASM disks from one disk group to another, resizing disks, and adding volumes dynamically.

Recommendations Common to ASM and File System

Unlike traditional storage, IOPS are not a function of LUN count. In other words, you get the same IOPS capacity with 1 LUN as you do with 100. However, since it is often convenient to monitor database performance by I/O type (for example, LGWR, DBWR, TEMP), we recommend creating ASM disk groups or file systems dedicated to these individual workloads. This strategy allows you to observe the characteristics of each I/O type either at the command line with tools like iostat and pureadm, or with the Pure Storage GUI.

original (4).png

In addition to isolating I/O types to individual disk groups, you should also locate the flash recovery area (FRA). We recommend opting for a few large LUNs per disk group.
 

If performance is a critical concern, we recommend that you do not multiplex redo logs. It is not necessary for redundancy since RAID-HA provides protection against media failures. Multiplexing redo logs introduced a performance impact of up to approximately 10% for a heavy OLTP workload. If your
operations policy requires you to multiplex redo logs, we recommend placing the group members in separate disk groups or file systems. For example, you can create two disk groups, REDOSSD1 and REDOSSD2, and multiplex across them:

original (5).png

Finally, while some flash storage vendors recommend a 4K block size for both redo logs and the database itself (to avoid block misalignment issues), Pure Storage does not. Since the Pure Storage FlashArray is designed on a 512-byte geometry, we never have block alignment issues. Performance is completely independent of the block size.

ASM Specific Recommendations

In Oracle 11gR3 the default striping for ONLINELOG template changed from FINE to COARSE. In OLTP workload testing, we found that the COARSE setting for redo logs performs about 20% better. Since the Pure Storage FlashArray includes RAID-HA protection, you can safely use External Redundancy for ASM diskgroups. Other factors such as sector size and AU size do not have a significant bearing on performance. 

ASM SCANORDER

Pure Storage recommends that you use multipath disks to achieve maximum performance and resiliency. If you are using ASM, you need to configure the SCANORDER to look at multipath devices first. You can do this by changing the following setting in /etc/sysconfig/oracleasm

Search for: 

ORACLEASM_SCANORDER=""

Change to 

ORACLEASM_SCANORDER="dm- sd"

More information can be found here: http://www.oracle.com/technetwork/to...th-097959.html

ASM Disk Group Recommendations
Disk Group Sector Size Strip AU Size Redundancy Notes
ORACRS 512 COARSE 1048576 External Small disk group for CRS
ORADATA 512 COARSE 1048576 External Database segments
ORAREDO 512 COARSE 1048576 External Redo logs
ORAFRA 512 COARSE 1048576 External Flash Recovery Area

Following SQL can be used to check the current redundancy type of all diskgroups.

select name, allocation_unit_size/1024/1024 as "AU", state, type, round(total_mb/1024,2) as "Total", round(free_mb/1024,2) as "Free"
  from v$asm_diskgroup;

ASM Space Reclamation

As you drop, truncate, or resize database objects in an ASM environment, the space metrics reported by the data dictionary (DBA_FREE_SPACE, V$ASM_DISKGROUP, V$DATAFILE, etc.) reflect your changes as expected. However, these actions may not always trim (free) space on the array immediately.

Oracle provides a utility called ASM Storage Reclamation Utility (ASRU), which expedites the trim operation. For example, after dropping 1.4TB of tablespaces and data files, Oracle reports the newly available space in V$ASM_DISKGROUP, but puredb list space still considers the space to be allocated. Consider the case when we drop the 190GB tablespace ASRUDEMO, which is in the ORADATA disk group.

Before dropping the tablespace:

original (6).png

And on the storage array:

original (7).png

After we drop the ASRUDEMO tablespace, v$asm_diskgroup updates the available space as expected.

original (8).png
However, we don’t see the space recovered on the storage array.

original (9).png

Although the array’s space reclamation frees space eventually, we can use the ASRU utility (under the “grid” O/S account on the database server) to trim the space immediately.

original (10).png

After running the ASRU command, the recovered space is visible in the Physical Space column of the puredb list space report.

original (11).png

ASMLib and Alternative

ASMLib is an Oracle-provided utility that allows you to configure block devices for use with ASM. Specifically, it marks devices as ASM disks and sets their permissions so that the o/s account that runs ASM (typically either grid or oracle) can manipulate these devices. For example, to create an ASM disk named MYASMDISK backed by /dev/dm-2 you would issue the command:

/etc/init.d/oracleasm createdisk MYASMDISK /dev/dm-2

Afterward, /dev/dm-2 appears to still have the same ownership and permissions, but ASMLib creates a file /dev/oracleasm/disks/MYASMDISK owned by the O/S user and the group is identified in /etc/sysconfig/oracleasm. Tell the ASM instance to look for potential disks in this directory through the asm_diskstring initialization parameter.

In case you do not want to use ASMLib, you can certainly use the UDEV rules, which is the basis for ASMLib.

ASMLib Alternative : udev

On RHEL 6.x you can use udev to present devices to ASM. Consider the device mpathbk created above. The device is created as /dev/mapper/mpathbk, linked to /dev/dm-4 and owned by root:

original (12).png

Perform the following steps to change the device ownership to grid:asadmin.

  1. Create an entry for the device in the udev rules file /etc/udev/rules.d/12-dm-permissions.rules as follows.
    original (13).png
  2. Use udevadm to trigger udev events and confirm the change in ownership for the block device:
    original (14).png
    Note the change in ownership to grid:asmadmin.
  3. Use sqlplus or asmca to create a new disk group or to put the new device in an existing disk group. Since the Purity Operating Environment provides RAID-HA, you can safely use External Redundancy for the disk group. Note that your asm_diskstring (discovery path) should be /dev/dm*original (15).pngAfter clicking “OK” the device is added to the disk group and an ASM rebalance operation executes automatically. We recommend using the same size disk for all members of a disk group for ease of rebalancing operations. 
 

File System Recommendations

There is no significant performance penalty for using a file system instead of ASM. As with ASM, we recommend placing data, redo, and the flash recovery area (FRA) onto separate volumes to ease administration. We also recommend using the ext4 file system and mount it with discard and noatime options. Below is a sample /etc/fstab file showing mount points /u01 (for oracle binaries, trace files, etc.), /oradata (for datafiles) and /oraredo (for online redo logs).

original (16).png


The main page for mount describes the /discard flag as follows:

discard/nodiscard
Controls whether ext4 should issue discard/TRIM commands to the underlying block device when blocks are freed. This is useful for SSD devices and sparse/thinly-provisioned LUNs, but it is off by default until sufficient testing has been done.

Mounting the ext4 file system with the discard flag causes freed space to be trimmed immediately, just as the ASRU utility trims the storage behind ASM disk groups.

Oracle Settings

For the most part, you don’t need to make changes to your Oracle configuration in order to realize immediate performance benefits on a Pure Storage FlashArray. However, if you have an extremely heavy OLTP workload, there are a few tweaks you can make that help you squeeze the most I/O out of your system. In our testing, we found that the following settings increased performance by about 5%.

init.ora settings

_high_priority_processes='LMS*|LGWR|PMON'
  • Sets processes scheduling priority to RR
  • Minimizes need to “wake” LGWR
  • Underscore parameter: consult oracle support
filesystemio_options = SETALL
  • Allows asynch i/o
log_buffer = {at least 15MB}
  • Values over 100MB are not uncommon

Use the CALIBRATE_IO Utility

Oracle provides a built in package dbms_resource_manager.calibrate_io which, like the ORION tool, generates workload on the I/O subsystem. However, unlike ORION, it works with your running Oracle database, and it generates statistics for the optimizer. Therefore, you should run calibrate_io and gather statistics for your application schema at least once before launching your application.

The calibrate_io script as provided in the Oracle documentation and presented here using our recommended values for <DISKS> and <MAX_LATENCY>.

SET SERVEROUTPUT ON
DECLARE
 lat INTEGER;
 iops INTEGER;
 mbps INTEGER;
BEGIN
-- DBMS_RESOURCE_MANAGER.CALIBRATE_IO (<DISKS>, <MAX_LATENCY>, iops, mbps, 
lat);
 DBMS_RESOURCE_MANAGER.CALIBRATE_IO (1000, 10, iops, mbps, lat);
 DBMS_OUTPUT.PUT_LINE ('max_iops = ' || iops);
 DBMS_OUTPUT.PUT_LINE ('latency = ' || lat);
 dbms_output.put_line('max_mbps = ' || mbps);
end;
/

Typically you will see output similar to the following: 

max_iops = 134079
latency  = 0 
max_mbps = 1516

Conclusion

Many of the traditional architecture decisions and compromises you have had to make with traditional storage are not relevant on a Pure Storage FlashArray. You do not need to sacrifice performance to gain resiliency, nor do you need to change existing policies that you may already have in place. In other words, there is no wrong way to deploy Oracle on Pure Storage; you can expect performance benefits out of the box.

  • That said, there are some configuration choices you can make to increase flexibility and maximize performance:
  • Use the Pure Storage recommended multipath.conf settings.
  • Set the scheduler, rq_affinity, and entropy for the Pure Storage devices.
  • Separate different I/O work loads to dedicated LUNS for enhanced visibility.
  • If you use a filesystem for data files, use ext4.
  • Always run calibrate_io.