Skip to main content
Pure Technical Services

Oracle Database Recommended Settings for FlashArray

Currently viewing public documentation. Please login to access the full scope of documentation.

Please be aware of the following issue when configuring Oracle: Oracle ASM Potential Issue That Can Cause Crashes.

The principle difference between configuring database storage on a Pure Storage FlashArray™ instead of spinning disks is that virtually all of your architecture choices are centered on manageability, not performance. Specifically, none of the following factors are relevant on a Pure Storage array:

  • Stripe width and depth
  • RAID level (mirroring)
  • Intelligent Data Placement (short stroking)
  • O/S and database block size
  • ASM vs. FileSystem

Striping refers to distributing files across multiple hard drives to enable parallel access and to maximize IOPS. A Pure Storage array consists of 22 solid state disks per shelf, and the Purity Operating Environment automatically distributes data across all drives in the array using an algorithm designed to optimize performance and provide redundancy. In other words, the striping is automatic.

The Pure Storage redundancy technology is called RAID-HA, and it is designed specifically to protect against the three failure modes specific to flash storage: device failure, bit errors, and performance variability. You don't need another form of RAID protection, so you don’t need to compromise capacity or performance for data protection. RAID is automatic.

Just as striping and mirroring are irrelevant on a Pure Storage array, so is the block size. Pure Storage is based on a fine-grained 512-byte geometry, so there are no block alignment issues as you might encounter in arrays designed with, for example, a 4KB geometry. Another benefit is a substantially higher deduplication rate than seen on other arrays offering data reduction.

Other flash vendors have designed their solutions on the new Advanced Format (AF) Technology, which allows for 4KB physical sector sizes instead of the traditional 512B sector size. But since solid-state disks don’t have sectors or cylinders or spindles, Pure Storage designed the Purity Operating Environment from the ground up to take advantage of flash’s unique capabilities. So, users gain flash performance without being shackled to any of the constraints of the disk paradigm.

In this document, we provide information to help you optimize the Pure Storage FlashArray for your Oracle database workload. Please note that these are general guidelines that are appropriate for many workloads, but as with all guidelines, you should verify that they are appropriate for your specific environment.

Operating System Recommendations

Pure Storage's operating system recommendations apply to all deployments: databases, VDI, etc. These recommendations apply whether you are using Oracle Automatic Storage Management (ASM), raw devices or a file system for your database storage. 

Queue Settings

We recommend two changes to the queue settings. The first selects the 'noop' I/O scheduler, which has been shown to get better performance with lower CPU overhead than the default schedulers (usually 'deadline' or 'cfq'). The second change eliminates the collection of entropy for the kernel random number generator, which has high CPU overhead when enabled for devices supporting high IOPS.

Manually Changing Queue Settings 

Not required unless LUNs are already in use with wrong settings.

These settings can be safely changed on a running system, by locating the Pure LUNs:

grep PURE /sys/block/sd*/device/vendor

And writing the desired values into sysfs files:

echo noop > /sys/block/sdx/queue/scheduler

An example for loop is shown here to quickly set all Pure luns to the desired 'noop' elevator:

for disk in $(lsscsi | grep PURE | awk '{print $6}'); do
    echo noop > /sys/block/${disk##/dev/}/queue/scheduler
done

All changes in this section take effect immediately, without rebooting for RHEL5 and higher. RHEL 4 releases will require a reboot. These changes will not persist unless they are added to the udev rule.

Notice, noop has [noop] to designate it as the desired scheduler.

[robm@robm-rhel7 ~]$ cat /sys/block/sdb/queue/scheduler
[noop] deadline cfq

Applying Queue Settings with udev

Once the IO scheduler elevator has been set to 'noop', it is often desired to keep the setting persistent, after reboots. 

Step 1: Create the Rules File

Create a new file in the following location (for each respective OS). The Linux OS will use the udev rules to set the elevators after each reboot.

RHEL
/etc/udev/rules.d/99-pure-storage.rules
Ubuntu  
/lib/udev/rules.d/99-pure-storage.rules

Step 2: Add the Following Entries to the Rules File  (Version Dependent)

The following entries automatically set the elevator to 'noop' each time the system is rebooted. Create a file that has the following entries, ensuring each entry exists on one line with no carriage returns:

Note that in RHEL 8.x ‘noop’ no longer exists and has been replaced by ‘none’.

RHEL 8.x and SuSE 15.2 and higher
# Recommended settings for Pure Storage FlashArray.
# Use none scheduler for high-performance solid-state storage for SCSI devices
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/scheduler}="none"
ACTION=="add|change", KERNEL=="dm-[0-9]*", SUBSYSTEM=="block", ENV{DM_NAME}=="3624a937*", ATTR{queue/scheduler}="none"

# Reduce CPU overhead due to entropy collection
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/add_random}="0"
ACTION=="add|change", KERNEL=="dm-[0-9]*", SUBSYSTEM=="block", ENV{DM_NAME}=="3624a937*", ATTR{queue/add_random}="0"

# Spread CPU load by redirecting completions to originating CPU
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/rq_affinity}="2"
ACTION=="add|change", KERNEL=="dm-[0-9]*", SUBSYSTEM=="block", ENV{DM_NAME}=="3624a937*", ATTR{queue/rq_affinity}="2"

# Set the HBA timeout to 60 seconds
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{device/timeout}="60"
RHEL 6.x, 7.x
# Recommended settings for Pure Storage FlashArray.

# Use none scheduler for high-performance solid-state storage
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/scheduler}="noop"

# Reduce CPU overhead due to entropy collection
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/add_random}="0"

# Spread CPU load by redirecting completions to originating CPU
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/rq_affinity}="2"

# Set the HBA timeout to 60 seconds
ACTION=="add", SUBSYSTEMS=="scsi", ATTRS{model}=="FlashArray      ", RUN+="/bin/sh -c 'echo 60 > /sys/$DEVPATH/device/timeout'"

Please note that 6 spaces are needed after "FlashArray" under "Set the HBA timeout to 60 seconds" above for the rule to take effect.

RHEL 5.x
# Recommended settings for Pure Storage FlashArray.
 
# Use noop scheduler for high-performance solid-state storage
ACTION=="add|change", KERNEL=="sd*[!0-9]|", SYSFS{vendor}=="PURE*", RUN+="/bin/sh -c 'echo noop > /sys/$devpath/queue/scheduler'" 

It is expected behavior that you only see the settings take effect for the sd* devices. The dm-* devices will not reflect the change directly but will inherit it from the sd* devices that make up its path. 

Recommended Multipath Settings 

HBA I/O Timeout Settings for Solaris

Though the Pure Storage FlashArray is designed to service IO with consistently low latency, there are error conditions that can cause much longer latencies and it is important to ensure dependent servers and applications are tuned appropriately to ride out these error conditions without issue. By design, given the worst case, recoverable error condition, the FlashArray will take up to 60 seconds to service an individual IO.

Edit /etc/system and either add or modify (if not present) the sd setting as follows:

set sd:sd_io_time = 0x3c
set ssd:ssd_io_time=0x3C

Note: 0x3c is hexadecimal for 60.

ASM Versus File System

On a Pure Storage FlashArray, there is no significant performance benefit to using ASM over a traditional file system, so the decision can be driven by your operational policies and guidelines. Whichever storage mechanism you choose performs well. From a DBA’s perspective, ASM does offer additional flexibility not found with file systems, such as the ability to move ASM disks from one disk group to another, resizing disks, and adding volumes dynamically.

Recommendations Common to ASM and File System

Unlike traditional storage, IOPS are not a function of LUN count. In other words, you get the same IOPS capacity with 1 LUN as you do with 100. However, since it is often convenient to monitor database performance by I/O type (for example, LGWR, DBWR, TEMP), we recommend creating ASM disk groups or file systems dedicated to these individual workloads. This strategy allows you to observe the characteristics of each I/O type either at the command line with tools like iostat and pureadm, or with the Pure Storage GUI.

original (4).png

In addition to isolating I/O types to individual disk groups, you should also locate the Fast Recovery Area (FRA). We recommend opting for a few large LUNs per disk group.

If performance is a critical concern, we recommend that you do not multiplex redo logs. It is not necessary for redundancy since RAID-HA provides protection against media failures. Multiplexing redo logs introduced a performance impact of up to approximately 10% for a heavy OLTP workload. If your
operations policy requires you to multiplex redo logs, we recommend placing the group members in separate disk groups or file systems. For example, you can create two disk groups, REDOSSD1 and REDOSSD2, and multiplex across them:

original (5).png

Finally, while some flash storage vendors recommend a 4K block size for both redo logs and the database itself (to avoid block misalignment issues), Pure Storage does not. Since the Pure Storage FlashArray is designed on a 512-byte geometry, we never have block alignment issues. Performance is completely independent of the block size.

ASM Specific Recommendations

In Oracle 11gR3 the default striping for ONLINELOG template changed from FINE to COARSE. In OLTP workload testing, we found that the COARSE setting for redo logs performs about 20% better. Since the Pure Storage FlashArray includes RAID-HA protection, you can safely use External Redundancy for ASM diskgroups. Other factors such as sector size and AU size do not have a significant bearing on performance. 

ASM SCANORDER

Pure Storage recommends that you use multipath disks to achieve maximum performance and resiliency. If you are using ASM, you need to configure the SCANORDER to look at multipath devices first. You can do this by changing the following setting in /etc/sysconfig/oracleasm.

Search for: 

ORACLEASM_SCANORDER=""

Change to 

ORACLEASM_SCANORDER="dm- sd"

More information can be found here: http://www.oracle.com/technetwork/to...th-097959.html.

ASM Disk Group Recommendations
Disk Group Sector Size Strip AU Size Redundancy Notes
ORACRS 512 COARSE 1048576 External Small disk group for CRS
ORADATA 512 COARSE 1048576 External Database segments
ORAREDO 512 COARSE 1048576 External Redo logs
ORAFRA 512 COARSE 1048576 External Fast Recovery Area

Following SQL can be used to check the current redundancy type of all diskgroups.

select name, allocation_unit_size/1024/1024 as "AU", state, type, round(total_mb/1024,2) as "Total", round(free_mb/1024,2) as "Free"
  from v$asm_diskgroup;

ASM Space Reclamation

Reclaiming Unused Space in an Oracle Database

As you drop, truncate, or resize database objects in an ASM environment, the space metrics reported by the data dictionary (DBA_FREE_SPACE, V$ASM_DISKGROUP, V$DATAFILE, etc.) reflect your changes as expected. However, these actions may not always trim (free) space on the array immediately. The following options are available to reclaim free space on the Flash Array.

Method 1 - ASMFD THIN_PROVISIONED Attribute

In Oracle 12.1, Oracle released a new kernel module called ASM Filter Driver (ASMFD) that sits in the I/O path of ASM disks. ASMFD provides additional functionality over what was provided by ASMLib. The one that interests us in the context of our problem is a new attribute that was introduced in 12.2 called THIN_PROVISIONED.

The THIN_PROVISIONED attribute enables or disables the functionality to discard unused storage space after a disk group rebalance is completed. The attribute value can be set to true to enable, or set to false to disable the functionality. The default value is false.

SQL > sqlplus / as sysasm
SQL > alter diskgroup DATA set attribute 'thin_provisioned'='TRUE';

When the COMPACT phase of a rebalance operation has completed, Oracle ASM informs the Flash Array which space is no longer used and can be repurposed.

ASM Disk Group rebalance can be triggered using the following command.

SQL > alter diskgroup DATA rebalance with balance compact wait;

Things to be aware of before implementing ASM Filter Driver. 

  1. Check the ASM Filter Driver Certification Matrix for supported Linux kernel versions.
  2. If ASMLib is installed and configured for an existing Oracle ASM installation, then it must be explicitly uninstalled before installing and configuring Oracle ASMFD.
  3. Make sure you are on a database version that includes a fix to Bug 28378446 - Using AFD and Attribute "thin_provisioned=true" Can Cause Database Corruption.

Method 2 - ASM Disk Group Reorganization

As discussed above, using the ASM Filter Driver with the THIN PROVISIONED attribute set to true can reclaim space when the tablespace is dropped. However, ASM Filter Driver is relatively new and there are many Oracle database installations that either have not or, for various reasons can not make the switch to using ASM Filter Driver. Moreover, there could be cases where the database has a large amount of free space due to dropped or truncated objects but it is not practical to drop the tablespace. For such scenarios, the method described below can be used.

Add a New Disk and Drop the Old One

Identify the disks that have a large amount of free space that can be reclaimed, and the disk group they are part of.

Execute the following command to replace the existing bloated disk in the disk group with a new disk. This SQL also initiates a ASM rebalance of the blocks from the old disk to the new disk. As the wait clause is used, control will be returned only after the operation is complete.

SQL > alter diskgroup DATA add disk '/dev/oracleasm/disks/DATA2' name DATA2
      drop disk DATA1
      rebalance power 16 wait;

So at the end of this command, the old disk will no longer be part of the disk group and can be safely deleted on the FlashArray. 

Note: Oracle recommends that all the disks in a disk group be of the same size. Starting with version 12.2, it enforces this condition for disk groups created with Normal or High redundancy. 

ASMLib and Alternative

ASMLib is an Oracle-provided utility that allows you to configure block devices for use with ASM. Specifically, it marks devices as ASM disks and sets their permissions so that the o/s account that runs ASM (typically either grid or oracle) can manipulate these devices. For example, to create an ASM disk named MYASMDISK backed by /dev/dm-2 you would issue the command:

/etc/init.d/oracleasm createdisk MYASMDISK /dev/dm-2

Afterward, /dev/dm-2 appears to still have the same ownership and permissions, but ASMLib creates a file /dev/oracleasm/disks/MYASMDISK owned by the O/S user and the group is identified in /etc/sysconfig/oracleasm. Tell the ASM instance to look for potential disks in this directory through the asm_diskstring initialization parameter.

In case you do not want to use ASMLib, you can certainly use the UDEV rules, which is the basis for ASMLib.

Dedicated ASM Disk Froups for Each Database

ASM allows files from multiple databases to be placed under a common disk group. Create dedicated ASM disk groups for each database.

ASM SPFILE Location

When a new database is created using the Database Configuration Assistant (dbca), ASM password (orapwasm) and spfile are typically placed in the +DATA diskgroup. It is recommended that database files not be placed on the same disk group as ASM files because restoring a database from a snapshot backup will overwrite the ASM configuration files as well. This will have the unintended consequence of crashing and/or corrupting the ASM instance.  

Disable ASM Compact

ALTER DISKGROUP DATA_PRODDB SET ATTRIBUTE '_rebalance_compact'='FALSE';

File System Recommendations

There is no significant performance penalty for using a file system instead of ASM. As with ASM, we recommend placing data, redo, and the fast recovery area (FRA) onto separate volumes to ease administration. We also recommend using the ext4 file system and mount it with discard and noatime options. Below is a sample /etc/fstab file showing mount points /u01 (for oracle binaries, trace files, etc.), /oradata (for datafiles) and /oraredo (for online redo logs).

original (16).png


The main page for mount describes the /discard flag as follows:

discard/nodiscard
Controls whether ext4 should issue discard/TRIM commands to the underlying block device when blocks are freed. This is useful for SSD devices and sparse/thinly-provisioned LUNs, but it is off by default until sufficient testing has been done.

Mounting the ext4 file system with the discard flag causes freed space to be trimmed immediately, just as the ASRU utility trims the storage behind ASM disk groups.

Oracle Settings

For the most part, you don’t need to make changes to your Oracle configuration in order to realize immediate performance benefits on a Pure Storage FlashArray. However, if you have an extremely heavy OLTP workload, there are a few tweaks you can make that help you squeeze the most I/O out of your system. In our testing, we found that the following settings increased performance by about 5%.

filesystemio_options = SETALL
  • Enabled both direct I/O and asynchronous I/O where possible.
log_buffer = {at least 15MB}
  • Values over 100MB are not uncommon.

On Linux, Automatic Storage Management uses asynchronous Input-Output by default. Asynchronous Input-Output is not supported for database files stored on Network File Systems.

If the file system files are managed through ODM library interface or Direct NFS Client, asynchronous Input-Output is enabled by default. There is no need to set FILESYSTEMIO_OPTIONS to enable asynchronous Input-Output in these environments.|

See: A.1 Supporting Asynchronous Input-Output

Use the CALIBRATE_IO Utility

Oracle provides a built in package dbms_resource_manager.calibrate_io which, like the ORION tool, generates workload on the I/O subsystem. However, unlike ORION, it works with your running Oracle database, and it generates statistics for the optimizer. Therefore, you should run calibrate_io and gather statistics for your application schema at least once before launching your application.

The calibrate_io script as provided in the Oracle documentation and presented here using our recommended values for <DISKS> and <MAX_LATENCY>.

SET SERVEROUTPUT ON
DECLARE
 lat INTEGER;
 iops INTEGER;
 mbps INTEGER;
BEGIN
-- DBMS_RESOURCE_MANAGER.CALIBRATE_IO (<DISKS>, <MAX_LATENCY>, iops, mbps, lat);
 DBMS_RESOURCE_MANAGER.CALIBRATE_IO (1000, 10, iops, mbps, lat);
 DBMS_OUTPUT.PUT_LINE ('max_iops = ' || iops);
 DBMS_OUTPUT.PUT_LINE ('latency = ' || lat);
 dbms_output.put_line('max_mbps = ' || mbps);
end;
/

Typically you will see output similar to the following: 

max_iops = 134079
latency  = 0 
max_mbps = 1516

Advise that timed_statistics must be set to TRUE before calibrate_io is run.

This can be checked with:

SQL> show parameter timed_statistics
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
timed_statistics boolean TRUE

Regarding Front End Buffer Exhaustion and Large I/O Size Requests

The FlashArray has a limited number of available front end buffers to allocate for incoming writes and outgoing read information to hosts. For simplicity, imagine there are only 4 total front end read buffers with a total size of 512KB and Reads are allocated in 128KB chunks. Host1 sends one Read I/O with a size request of 400KB and Host2 sends two I/Os with a size request of 200KB totaling 400KB. Both hosts need buffers allocated at 128KB, which breaks down as 400KB / 128KB = 3.125 buffers needed, which is a minimum of 4 buffers. As Host1 reached the FA first there are now no more available front end buffers. Host2 I/Os are now queued and have to wait until Host1 single I/O is completed before any further I/Os can be serviced. If both hosts are sending Read sizes of 128KB each then four I/Os can now be processed in parallel rather than just a single I/O at 400KB or just 2 I/Os at 200KB. The larger the I/O size the more front end buffers that need to be allocated in order to service an I/O.

This issue can be exacerbated especially in situations where multiple hosts are sending Large I/O size requests at the same time, as well as when workflow demands increase over time where Large I/O has always been used. Because larger I/Os have consumed all front end buffers, Purity will now have to start queuing I/O, which will cause increased latency. This is known as serializing I/O (sequential processing of an I/O) rather than processing I/Os in parallel. This is why it is important to follow SAN best practices as having less I/O paths available means we are artificially creating a bottleneck by not maximizing the number of available front end buffers. It is vitally important that relevant host, workflow, and environment KBs are followed to help prevent potential performance issues (relevant KBs can be found through the Solutions Landing Page). Host Large I/O Size Requests are often overlooked as a source of performance issues. Make sure that I/O size requests are reasonable in size, as well as host and environment recommendations are being followed, to maximize host performance to the FA.

Regarding I/O Size and Host Queue Depth

There are two approaches as lowering I/O size is one way to accomplish better performance but if the customer can't lower I/O size, and hosts are going to use large I/Os, then the customer should lower host Queue Depth. There is no magic guidelines for this such as "if using X I/O size then use X Queue Depth size" and this will have to be something customer needs to test to see what provides the best performance for their particular workflows.

General rule is if using large I/O size hosts need to use a smaller Queue Depth and if using smaller I/O size hosts can use larger Queue Depths.

VMware

Pure Storage recommends grouping your ESXi hosts into clusters within vCenter—as this provides a variety of benefits like High Availability and Dynamic Resource Scheduling. To provide simple provisioning, Pure Storage also recommends creating host groups that correspond to VMware clusters. Therefore, with every VMware cluster that will use FlashArray storage, a respective host group should be created. Every ESXi host that is in the cluster should have a corresponding host (as described above) that is added to a host group.

clipboard_ee5b00a61f49bb531441353ea4115e47a.png clipboard_e969623aa8ca0f4d722151d8b5a350941.png

Conclusion

Many of the traditional architecture decisions and compromises you have had to make with traditional storage are not relevant on a Pure Storage FlashArray. You do not need to sacrifice performance to gain resiliency, nor do you need to change existing policies that you may already have in place. In other words, there is no wrong way to deploy Oracle on Pure Storage; you can expect performance benefits out of the box.

  • That said, there are some configuration choices you can make to increase flexibility and maximize performance:
  • Use the Pure Storage recommended multipath.conf settings.
  • Set the scheduler, rq_affinity, and entropy for the Pure Storage devices.
  • Separate different I/O work loads to dedicated LUNS for enhanced visibility.
  • If you use a filesystem for data files, use ext4.
  • Always run calibrate_io.