The two keys to FlashArray cost-effectiveness are highly efficient provisioning and data reduction. One of an array administrator's primary tasks is understanding and managing physical and virtual storage capacity. This chapter describes the ways in which physical storage and virtual capacity are used and measured.
Array Capacity and Storage Consumption
Administrators monitor physical storage consumption and manage it by adding storage capacity or relocating data sets when available (unallocated) storage becomes dangerously low.
Physical Storage States
In a FlashArray, the physical storage that holds data can be in one of four states: unique, shared, stale, and unallocated.
Figure 1. FlashArray Physical Storage States
Unique data. Reduced host-written data that is not duplicated elsewhere in the array and descriptive metadata.
Shared data. Deduplicated data. Data that comprises the contents of two or more sector addresses in the same or different volumes (FlashArray deduplication is array-wide).
Stale data. Overwritten or deleted data. Data representing the contents of virtual sectors that have been overwritten or deleted by a host or by an array administrator. Such storage is deallocated and made available for future use by the continuous storage reclamation process, but because the process runs asynchronously in the background, deallocation is not immediate.
Unallocated storage. Available for storing incoming data.
Reporting Array Capacity and Storage Consumption
Array physical storage capacity and the amount of storage occupied by data and metadata is displayed through the Purity GUI (purearray list --space). For example,) and Purity CLI (
$ purearray list --space Name Capacity … System Shared Space Volumes Snapshots Total FLASH 10.05T … 1.77T 2.78T 2.09T 409.08G 7.04T
Volume and Snapshot Storage Consumption
FlashArrays present disk-like volumes to connected hosts. They also maintain immutable snapshots of volume contents. As with conventional disks, a volume's storage capacity is presented as a set of consecutively numbered 512-byte sectors into which data can be written and from which it can be read. Hosts read and write data in blocks, which are represented as consecutively-numbered sequences of sectors.
Purity allocates (also known as "provisions") storage for data written by hosts, and reduces the data before storing it.
The provisioned size of a volume is its capacity as reported to hosts. As with conventional disks, the size presented by a FlashArray volume is nominally fixed, although it can be increased or decreased by an administrator. To optimize physical storage utilization, however, FlashArray volumes are thin and micro provisioned.
Thin provisioning. Like conventional arrays that support thin provisioning, FlashArrays do not allocate physical storage for volume sectors that no host has ever written, or for trimmed (expressly deallocated by host or array administrator command) sector addresses.
Micro provisioning. Unlike conventional thin provisioning arrays, FlashArrays allocate only the exact amount of physical storage required by each host-written block after reduction. In FlashArrays, there is no concept of allocating storage in "chunks" of some fixed size.
The second key to FlashArray cost effectiveness is data reduction, which is the elimination of redundant data through pattern elimination, duplicate elimination, and compression.
Pattern elimination. When Purity detects sequences of incoming sectors whose contents consist entirely of repeating patterns, it stores a description of the pattern and the sectors that contain it rather than the data itself. The software treats zero-filled sectors as if they had been trimmed—no space is allocated for them.
Duplicate elimination. Purity computes a hash value for each incoming sector and attempts to determine whether another sector with the same hash value is stored in the array. If so, the sector is read and compared with the incoming one to avoid the possibility of aliasing. Instead of storing the incoming sector redundantly, Purity stores an additional reference to the single data representation. Purity deduplicates data globally (across an entire array), so if an identical sector is stored in an array, it is a deduplication candidate, regardless of the volume(s) with which it is associated.
Compression. Purity attempts to compress the data in incoming sectors, cursorily upon entry, and more exhaustively during its continuous storage reclamation background process.
Purity applies pattern elimination, duplicate elimination, and compression techniques to data as it enters an array, as well as throughout the data's lifetime.
The following hypothetical example illustrates the cumulative effect of FlashArray data reduction on physical storage consumption.
Figure 2. Data Reduction Example
In the example, hosts have written data to a total of 1,000 unique sector addresses through:
Pattern elimination. 100 blocks contain repeated patterns, for which Purity stores metadata descriptors rather than the actual data.
Duplicate elimination. 200 blocks are duplicates of blocks already stored in the array; Purity stores references to these rather than duplicating stored data.
Compression. The remaining 70% of blocks compress to half their host-written size; Purity compresses them before storing, and during continuous storage reclamation.
Therefore, the net physical storage consumed by host-written data in this example is 35% of the number of unique volume sector addresses to which hosts have written data.
The data reduction example is hypothetical; each data set reduces differently, and unrelated data stored in an array can influence reduction. Nevertheless, administrators can use the array and volume measures reported by Purity to estimate the amount of physical storage likely to be consumed by data sets similar to those already stored in an array.
Snapshots and Physical Storage
FlashArray snapshots occupy physical storage only in proportion to the number of sectors of their source volumes that are overwritten by hosts.
Figure 3. Snapshot Space Consumption Example
In the example, two snapshots of a volume, S1 and S2, are taken at times t1 and t2 (t1 prior to t2). If a host writes data to the volume after t1 but before t2, Purity preserves the overwritten sectors' original contents and associates them with S1 (i.e., space accounting charges them to S1). If in the interval between t1 and t2 a host reads sectors from snapshot S1, Purity delivers:
For sectors not modified since t1, current sector contents associated with the volume.
For sectors modified since t1, preserved volume sector contents associated with S1.
Similarly, if a host writes volume sectors after t2, Purity preserves the overwritten sectors' previous contents and associates them with S2 for space accounting purposes. If a host reads sectors from S2, Purity delivers:
For sectors not modified since t2, current sector contents associated with the volume.
For sectors modified since t2, preserved volume sector contents associated with S2.
If, however, a host reads sectors from S1 after t2, Purity delivers:
For sectors modified since t2, current sector contents associated with the volume.
For sectors modified between t1 and t2, preserved volume sector contents associated with S1.
For sectors modified since t2, preserved volume sector contents associated with S2.
If S1 is destroyed, storage associated with it is reclaimed because there is no longer a need to preserve pre-update content for updates made prior to t2.
If S2 is destroyed, however, storage associated with it is preserved and associated with S1 because the data in it represents pre-update content for sectors updated after t1.
To generalize, for volumes with two or more snapshots:
Destroying the oldest snapshot. Space associated with the destroyed snapshot is reclaimed after the 24-hour eradication pending period has elapsed or after an administrator purposely eradicates the destroyed snapshot.
Destroying other snapshots. Space associated with the destroyed snapshot is associated with the next older snapshot unless it is already reflected there because the same sector was written both after the next older snapshot and after the destroyed snapshot, in which case it is reclaimed.
Reporting Volume and Snapshot Storage Consumption
Because data stored in a FlashArray is virtualized, thin-provisioned, and reduced, volume storage is monitored, managed, and displayed from two viewpoints:
Host view. Displays the virtual storage capacity (size) and consumption as seen by the host storage administration tools.
Array view. Displays the physical storage capacity occupied by data and the metadata that describes and protects it.
Volume size and physical storage consumption data is displayed through the Purity GUI (purevol list --space).) and Purity CLI (
For example (single output displayed over two rows),
$ purevol list --space Name Size Thin Provisioning Data Reduction Total Reduction … Total VOL1 3T 50% 3.7 to 1 7.4 to 1 … 664.64G
$ purevol list --space Name Size … Volume Snapshots Shared Space System Total VOL1 3T … 484.78G 31.85G - 148.01G 664.64G
FlashArray Data Lifecycle
Data stored in a FlashArray undergoes continuous reorganization to improve physical storage utilization and reclaim storage occupied by data that has been superseded by host overwrite or deletion.
Figure 4. FlashArray Physical Storage Life Cycle
The steps enumerated in Figure 4 are as follows:
- Host Write Processing (1)
Data written by hosts undergoes initial processing as it enters an array. The result is data that has undergone initial reduction, and been placed in write buffers.
- Writing to Persistent Storage (2)
As write buffers fill, they are written to segments of persistent flash storage.
- Segment Selection (3)
A Purity background process continually monitors storage segments for data that has been obsoleted by host overwrites, volume destruction or truncation, or trimming (4). Segments that contain a predominance of obsoleted data become high-priority candidates for storage reclamation and further reduction of the live data in them.
- Data Reduction (3)
As Purity processes segments, it opportunistically deduplicates and compresses (5) the live data in them, using more exhaustive algorithms than those used during initial reduction (1). Reprocessed data is moved to write buffers that are being filled; thus, write buffers generally contain a combination of new data entering the array and data that has been moved from segments being vacated to improve utilization.
- Storage Reclamation (6)
As live data is moved from segments, they are returned to the pool of storage available for allocating segments. Purity treats all of an array's flash module storage as a single homogeneous pool.
- Reallocation (7)
Purity allocates segments of storage from the pool of available flash modules as they are required. Typically, the software fills write buffers for multiple segments concurrently. This allows the software to consolidate different types of data (e.g., highly-compressible, highly-duplicated, etc.) so that the most appropriate policies can be applied to them.
Occasionally, continuous data reduction can result in behavior unfamiliar to administrators experienced with conventional arrays. For example, as Purity detects additional duplication and compresses block contents more efficiently, a volume's physical storage occupancy may decrease, even as hosts write more data to it.