Skip to main content
Pure Technical Services

Best Practices for Splunk on Pure Storage

Currently viewing public documentation. Please login to access the full scope of documentation.

This document covers the various best practices for Splunk on Pure Storage. This includes the Splunk Classic architecture with Hot/Warm on Pure FlashArray, cold on Pure FlashArray over FC/iSCSI, or FlashBlade over NFS as well as Splunk SmartStore architecture with data on the Pure FlashBlade over S3 and Splunk Multisite SmartStore with Pure FlashBlade.

Scope

Splunk Classic architecture

  • Hot/Warm/Cold on Pure FlashArray over FC/iSCSI
  • Cold on Pure FlashBlade over NFS 

Splunk SmartStore architecture

  • Hot/Warm Cache on Pure FlashArray over FC/iSCSI (or DAS)
  • Warm remote on Pure FlashBlade over S3

 

Splunk Classic Architecture

Pure Volumes on FlashArray (Hot/Warm, Cold)

Configuring volumes for Splunk indexers could not be any simpler: due to the unique capabilities of flash and the design of the Purity Operating Environment, the factors below are neither relevant nor significant on FlashArray.

Factors

Relevancy

Details

Stripe width and depth

Automatic

Purity Operating environment automatically distributes data across all drives in the array

RAID level

Automatic

Pure FlashArray uses RAID-HA, designed to protect against three failure modes specific to flash storage: device failure, bit errors, and performance variability

Intelligent data placement

Insignificant

Purity Operating Environment has been designed from the ground up to take advantage of flash’s unique capabilities as they are not constrained by the disk paradigm anymore, and, as such, “hot” and “cold” disk platter placements are not relevant

For ease of bucket management, and to enable backups of Warm or Cold buckets, we recommend using separate Pure volumes for Hot/Warm, Cold, and Frozen buckets (if you decide to use Frozen on FlashArray) per indexer.

Bucket Type

Volume count

Location in indexes.conf

Hot/Warm

1 FA volume per indexer

Separate volume stanza for Hot/warm buckets like
[volume:hot]
path = /hot/splunk

Cold

1 FA volume per indexer

Separate volume stanza for Cold buckets like
[volume:cold]
path = /cold/splunk

Frozen

1 FA volume per indexer

coldToFronzenDir or coldToFrozenScript under each <index> stanza

 

Make sure to mount these FlashArray volumes on the relevant indexers onto the same mount point like “/hot” or “/cold” for the indexes.conf to be effective on the indexer.

As Pure FlashArray volumes are always thin-provisioned, Splunk Administrators can provision a large-sized volume to avoid adding additional volumes to meet the space growth.

Keep all the FlashArray volumes for all the indexers in a cluster at the same size to avoid imbalanced space usage.

Linux Mount options

You are welcome to use either EXT4 or XFS filesystems on the Splunk Indexers to mount the FlashArray volumes.  As buckets age and when directories are removed, the underlying block storage has to be issued with TRIM/unmap commands to reclaim the space.  To accomplish this,  you can use the discard mount option which will issue the TRIM command to FlashArray to release the space occupied by those directories. 

Following are the recommended mount options:

discard,noatime

If the discard option is not a preferred option based on your standard operating procedure, make sure to issue the fstrim command on the mount point periodically, once a day or once a week, to release the space at the FlashArray level.

Logical Volume Manager

Recommended using the logical volume manager (LVM) at the indexer level to attach the FlashArray volume to a volume group and carve out the logical volume for the Hot/Warm or Cold tier out of it.  This enables dynamic storage addition when the indexer needs more storage space for the Hot/Warm or Cold tier when they are hosted on Pure FlashArray.

Linux Best Practices

The Linux recommended settings for FlashArray, including multipathing queue settings, are documented under the Solutions page at the Pure Storage support site.

https://support.purestorage.com/Solu...ended_Settings

Cold tier on Pure FlashBlade

FlashBlade filesystems

  • Always create a separate NFS filesystem for every indexer to host the Cold tier.

  • FlashBlade filesystems support both NFSv3 and NFSv4.1 protocols.  Choose either one as per your requirement.

  • FlashBlade filesystems are always thin-provisioned, Splunk Administrators can provision a large-sized filesystem to avoid updating the size to meet the space growth.

  • Do not set the hard limit parameter for the filesystem size as this will limit the flexibility of adding more space as needed.

  • Keep all the NFS filesystems for all the indexers in a cluster of the same size to avoid unbalanced space usage.

Linux Mount options

 Use the following mount options to mount the NFS filesystem on the indexer nodes for the Cold tier. 

rw,bg,nointr,hard,tcp,vers=[3|4.1],rsize=16384
  • Select one of the NFS protocol "3" or "4.1" for the vers option. 
    • Alternatively, use the nfs4 mount type without the vers option.
      $ mount -t nfs4 -o rw,bg,hard,nointr,tcp,rsize=16384 10.21.214.203:/splunk-cold01 /cold
  • Always mount the filesystem with "hard" mount option and do not use "soft" NFS mounts.
  • Do not disable attribute caching.
  • Do not specify the wsize option as the host can get the default size offered by FlashBlade (512K).
  • To persist these changes across reboots, please include them in the /etc/fstab file as given below. The IP address specified below refers to the data VIP from the FlashBlade.
10.21.214.200:/splunk-cold01 /cold nfs rw,bg,nointr,hard,tcp,vers=3,rsize=16384
or
10.21.214.200:/splunk-cold01 /cold nfs4 rw,bg,nointr,hard,tcp,rsize=16384

Note: Changing the default rsize from 512K to 16K or 32K offers a better read performance.  

Splunk do not recommend placing Hot/Warm tier on NFS.  Please see Splunk documentation for more details.

Splunk SmartStore Architecture

Remote warm tier on FlashBlade

The minimum Purity//FB version to run Splunk SmartStore on FlashBlade is 2.3.0. 

This includes all the object-related functionalities that are required to host Splunk SmartStore index data on FlashBlade using S3 protocol.

Remote Volume

The volume definition for the remote storage in indexes.conf points to the remote object store where Splunk SmartStore stores the warm data.  The remote volume definition looks like the following.

[volume:remote_store]
storageType = remote
path = s3://<bucket name>
# The following S3 settings are required only if you’re using the access and secret keys 
remote.s3.access_key = <access key of the account that holds the bucket>
remote.s3.secret_key = <secret key of the account that holds the bucket>
remote.s3.endpoint = http://<FlashBlade-data-vip>

remote.s3.supports_versioning = false 
remote.s3.list_objects_version = v2

[splunk_index]
remotePath = volume:remote_store/$_index_name
repFactor = auto
homePath = <home path specification>
  • Each remote volume definition can have only one path meaning a single S3 bucket name

  • The remote volume which refers to the S3 bucket on a FlashBlade should be limited to an indexer cluster or a standalone indexer.  The same S3 bucket cannot be shared across two clusters or standalone indexers.

  • An indexer cluster or a standalone indexer can have one or more remote volumes.  

  • A SmartStore index is limited to a single remote volume and cannot be spread across multiple remote volumes.

  • All peer nodes of an indexer cluster should use the same SmartStore configurations.

Please see this article for the recommended indexes.conf settings for Splunk SmartStore on Pure Storage FlashBlade.

 

Splunk related settings 

Bucket Size

Splunk has predefined sizes for the bucket that can be configured under the maxDataSize parameter in indexes.conf as

maxDataSize = <positive integer> | auto | auto_high_volume

Default is “auto” at 750MB whereas auto_high_volume is 10GB on 64-bit systems and 1GB on 32-bit systems.

The general recommendation by Splunk for a high volume environment is to set the bucket size to auto_high_volume but for Splunk SmartStore indexes, the specific recommendation is to use “auto” (750MB) or lower. This is to avoid timeouts when downloading big sized buckets from the remote object store back to the cache.

Recommended setting:

maxDataSize = auto
TSIDX Reduction

SmartStore doesn’t support TSIDX reduction. Do not set the parameter enableTsidxReduction to “true” for SmartStore indexes.

Recommended setting:

enableTsidxReduction: false
Bloom Filters

Bloom filters play a key role with SmartStore in reducing the download of tsidx data from the remote object store to the cache. Do not set the parameter createBloomfilter to “false.”

Recommended setting:

createBloomfilter: true
Versioning

FlashBlade supports versioning which is recommended by SmartStore to protect against any accidental deletion.  Splunk data is generally deleted when it surpasses the configured data retention period.  Setting this parameter to false on S3 storage like FlashBlade that supports versioning allows Splunk to put a delete marker on the objects rather than physically deleting them which makes it possible to protect against the accidental deletion.  If this parameter is set to true, which is the default setting, all versions of the data are deleted permanently by Splunk SmartStore when it ages out and cannot be recovered.

Recommended setting:

remote.s3.supports_versioning = false 

If protection against any accidental deletion is required, it is imperative that the versioning setting is enabled at the FlashBlade bucket level upon creation as the default is no versioning.  If accidental deletion protection is not required, the versioning at the FlashBlade bucket level can be left at default (none).  The following picture shows how to enable the versioning of a bucket through FB GUI.

clipboard_e7ec424c4eedd8d58fa1e425155c548aa.png

In case if the Purity//FB version (below 3.0) doesn’t support the online enablement of the version, use the following AWS CLI command to enable the bucket versioning. 

aws s3api put-bucket-versioning --bucket <bucket-name> --versioning-configuration Status=Enabled
Space Reclamation

As the parameter remote.s3.supports_versioning is set to false and if the versioning is enabled at the FlashBlade bucket level, the data is not physically removed when data ages out.  Hence it is recommended to set a lifecycle policy at the FlashBlade S3 bucket level to physically remove the deleted data and reclaim the space. 

Note, if the versioning at FlashBlade bucket level is not enabled but remote.s3.supports_versioning is set to false, any object deletes will physically remove the object.

Starting Purity//FB 3.1 release, the lifecycle policy can be set through the GUI.  

To set the policy, select the account under the Object Store and click on the bucket.  It should bring up the page which should have the Lifecycle Rules option.

clipboard_e109e5a50678f9a2dfc41994da67b3824.png

Click the + symbol on the right against the Lifecycle Rules and specify a rule name and enter your desired days to keep the previous versions before they are removed physically.  In the example below, we have created a rule named rule1 with 3 days to keep the previous versions.  After 3 days, the previous versions of the objects are removed.  Please choose the days for "Keep Previous Version For" based on your requirements.  The minimum you can configure is 1 day.

 

Do not set the "Keep Current Version" options in the lifecycle policy as it will remove the active objects that are still used by Splunk.  Only set the "Keep Previous Version" if you wanted to recycle the deleted objects. 

 

Purity//FB 3.1.x

clipboard_e1c11b23a4abe70d7b37a9db9f480e628.png

Purity//FB 3.2.4 & above

clipboard_ed9ab146641d2e22efde50a5ec345e549.png

             

For any Purity//FB version below 3.1, the lifecycle policy can only be set through python code and not through the GUI.

Following is a sample python code that can be used to set the lifecycle policy of a given bucket in a FlashBlade.  This code will remove all noncurrent versions (or previous versions) of the objects (deleted or overwritten objects), say after 3 days.  Please update the value for NoncurrentDays as per your requirement.  

import boto3
s3 = boto3.resource(service_name='s3', use_ssl=False,
      aws_access_key_id='<access_key>',
      aws_secret_access_key='secret_key',
      endpoint_url='http://<FB data-vip>')

s3.meta.client.put_bucket_lifecycle_configuration (
  Bucket='<bucket-name>',
  LifecycleConfiguration={
    'Rules': [
      { 'ID' : 'rule1',
        'Filter' : {},
        'Status' : 'Enabled',
        'NoncurrentVersionExpiration': { 'NoncurrentDays': 3 },
      } 
     ] 
    } 
   )
Multi-part upload/download

FlashBlade supports multipart upload and download and the default setting of 128MB should be good enough and recommended not to modify unless the new value has been proven to improve throughput.

List Object Version

FlashBlade supports objects listing version V2 which is much more performant than V1.  To improve performance when Splunk is dealing with objects, V2 is highly recommended.

Recommended setting:

remote.s3.list_objects_version = v2
URL Version

The parameter remote.s3.url_version is used for parsing the endpoint and communicating with the remote storage. The parameter allows options v1 or v2. 

In v1, the bucket is the first element of the path like mydomain.com/bucket/remaining/path.

In v2, the bucket is the outermost element of the subdomain like bucket.mydomain.com/remaining/path.

While FlashBlade can support the use of either version, we have noticed using v2 with Splunk results in inadvertent effects like objects not getting deleted or Splunk command line with rfs not working. Hence the recommendation is to not set the parameter which defaults to v1.

Do not set the parameter remote.s3.url_version to v2. 

Cache Manager settings

Cache Manager plays a vital role in maximizing the search efficiency by managing the local cache intelligently. The cache manager favors holding the buckets that have high chances of participating in future searches and when the cache fills up, it evicts the buckets that are least likely to participate in future searches. For more information on how CacheManager works please see SmartStore Cache Manager

CacheManager settings generally have “global” scope and configured under the [cachemanager] stanza in server.conf.  In an indexer cluster environment, the settings are configured at each index peer node.

Except for the “recency” settings, any other CacheManger settings cannot be applied at an index level.

eviction_policy

Splunk recommends not to change the default eviction policy of lru which evicts the buckets that are least recently used.  

max_cache_size

Specify the maximum size for the disk partition that hosts the cache in megabytes.  This setting is applied at an indexer level and not the maximum cache size across the cluster.  When the occupied space of the cache exceeds the max_cache_size, or falls below the sum of minFreeSpace and eviction_padding, the cache manager will start to evict the data.

hotlist_recency_secs

Splunk SmartStore eviction policy generally favors the least recently searched buckets meaning the cache manager will keep the buckets that are searched recently and evict the buckets that are least recently searched even if the bucket was recently created. 

If most of your searches are on the recently ingested data, it makes more sense to protect this data from being evicted using the hotlist_recency_secs parameter.  This parameter sets the cache retention period based on the bucket’s age (aka recency) of the warm buckets in the cache and helps to protect the recent buckets over other buckets.  This setting overrides the eviction policy.

The recency or the bucket age is determined by the interval between the bucket’s latest time and the current time.  As the name implies, the setting is in seconds and the default is 86400 seconds or 1 day.  The CacheManager will not evict the buckets until they reach this configured setting unless all other buckets have already been evicted.

Setting can be at an index level or at the global level within the indexes.conf file but the recommendation is to set this parameter at an index level to favor protecting data in critical indexes over non-critical indexes.

For optimal functionality of cache eviction, set this parameter in consideration with the max_cache_size settings.  Do not set a value for hotlist_recency_secs that would require cache size beyond the max_cache_size value could provide as this can impact the cache eviction functionality. 

For example, if the daily ingest adds 100GB of new buckets daily, a cache size of 500GB can only hold five days of recent data, and hence any hotlist_recency_secs over 5 days would impact the cache eviction to work optimally.   Alternatively, if your search is always within the last 30 days and limited to the data ingested within the last 30 days, set hotlist_recency_secs to 2592000 seconds or 30 days and make sure the max_cache_size can hold 30 or more days of daily ingest data.

Recommended setting:

Please set the hotlist_recency_secs parameter at the index level for critical indexes in indexes.conf to protect the data in the cache from eviction based on the required age and in alignment with the max_cache_size settings.  

hotlist_bloom_filter_recency_hours

Similar to hotlist_recency_secs, the hotlist_bloom_filter_recency_hours parameter protects the metadata files like bloomfilter from eviction.  The use of bloom filters during searches avoids the need to download larger bucket objects like the rawdata journal file or the time series index files (tsidx) from the remote object storage.

The default setting is 360 hours or 15 days.  With this setting, the cache manager will defer eviction of smaller files like bloomfilter until the interval between the bucket’s latest time and the current time exceeds this setting. If the searches are limited to the recently ingested data within say last n days, set this parameter for all the critical indexes to the hour that corresponds to n days.  If the search is limited to the last 30 days, set this parameter to 720.

Recommended setting:

Please set the hotlist_bloom_filter_recency_hours parameter at the index level for critical indexes in indexes.conf to protect the data smaller metadata files in the cache from eviction based on the required age.

 

Splunk Multisite SmartStore Architecture

Splunk Multisite SmartStore is generally deployed to meet the disaster recovery requirements.  Splunk Multisite SmartStore deployment on premises is limited to two sites, with each site hosted in an on-premises data center, in an active-active mode.  This means, input data can be ingested through both the sites.

Prerequisites

1) To host the Splunk Multisite data on FlashBlade, you would need two FlashBlades, each located in an on-premises data center with the minimum Purity//FB version of 3.3.3 that offers the "multi-site-writable" bucket option.  This feature includes all the object-related functionalities that are required to host Splunk SmartStore index data on FlashBlades using S3 protocol as well as the object replication requirements as laid out by Splunk in their documentation.

The minimum Purity//FB version to run Splunk Multisite SmartStore on FlashBlades is 3.3.3. 

Please see the document to setup the FlashBlades for Splunk Multisite SmartStore.

2) Third-party VIP or Global Server Load Balancing (GSLB) to route the traffic from the peer nodes to the object store (FlashBlade) hosted on its site's location.  In case of a site failure or the local FlashBlade failure, the VIP or GSLB should be able to reroute the peer's traffic as necessary to the FlashBlade in the other site.

The remote.s3.endpoint refers to the FlashBlade data vip that is configured through the GSLB and hence should refer a URI instead of the hard coded IP address.

Please see Splunk's documentation for the Splunk Multisite SmartStore deployment requirements on premises.

Remote Volume

The volume definition for the remote storage in indexes.conf points to the remote object store where Splunk SmartStore stores the warm data.  The remote volume definition looks like the following.

[volume:remote_store]
storageType = remote
# The bucket name should be same across both FlashBlades
path = s3://<bucket name>  

# The access_key and secret_key should be same across both FlashBlades 
remote.s3.access_key = <access key of the account that holds the bucket>
remote.s3.secret_key = <secret key of the account that holds the bucket>

# Make sure to include the Object store URI through third-party VIP or GSLB
remote.s3.endpoint = http://<FlashBlade-URI>

remote.s3.supports_versioning = false 
remote.s3.list_objects_version = v2

[splunk_index]
remotePath = volume:remote_store/$_index_name
repFactor = auto
homePath = <home path specification>
  • As the same indexes.conf file is deployed across the two sites, the parameters under the volume:remote_store are applicable to both the FlashBlades.
    • The S3 bucket name as referred by the path parameter should be the same on both the FlashBlades.
    • The access_key, secret_key should be the same on both the FlashBlades.
  • The remote.s3.endpoint should point to the FlashBlade URI through a third-party VIP (Virtual IP) or GSLB (Global Server Load Balancing).
    • The VIP or GSLB routes traffic from the Splunk indexer nodes to the FlashBlade hosted on its site's location.  In case of a FlashBlade failure, the VIP or GSLB reroutes the traffic as necessary to the remaining active FlashBlade.
    • Using the IP address from the FlashBlade Data vip will not work with the Multisite
  • Please see Splunk's documentation for more details on the deployment topology.

Replication Lag

Based on the distance between the two FlashBlades, there will be a replication lag time between them when the FlashBlades replicate the objects asynchronously.  To eliminate the peer indexer on the other site to upload its copy prior to the completion of the object store replication, the following parameter should be updated. 

remote_storage_upload_timeout

The parameter remote_storage_upload_timeout under [clustering] stanza on the server.conf for all the indexers across the site should be set to a time (in seconds) higher than the maximum replication lag time between the two FlashBlades.  

Recommended setting:

remote_storage_upload_timeout = 600

Note:  This setting should be updated to a higher value if you notice the replication lag goes beyond 600 seconds.  If the latency between two FlashBlades is over 200ms, there is a good chance the 600 seconds might not be sufficient.  Please review and update accordingly.