MongoDB on FlashArray Implementation and Best Practices
MongoDB is a source-available document-oriented database management system developed and distributed by MongoDB Inc. It can be acquired using steps provided in the MongoDB documentation or directly from GitHub.
MongoDB can be deployed to operate in one of three(3) modes :
- Standalone - All of the datasets and resources are located in a single system. This mode of operation is not recommended for production use but can instead be used for test and development purposes.
- Replica Set - A replica set is a group of deployments that maintain the same data set. This mode of operation provides high availability and redundancy for production deployments.
- Sharded Cluster - A sharded cluster provides horizontal scaling capabilities for large data sets. Data sets are distributed across a group of systems.
Hardware Environment Requirements
Block storage on FlashArray can be accessed using a number of protocols. These protocols and their requirements are listed as follows :
Protocol | Requirements |
---|---|
Fibre Channel Protocol (FCP) for SCSI | A fiber channel fabric with an established zone between the initiator (MongoDB deployment system) and host (FlashArray). |
Internet Protocol for SCSI (iSCSI) | A high speed (Min 10Gb/s speed capability) TCP/IP network with the initiator (MongoDB deployment system) and target(FlashArray) having either routable or direct line-of-sight connectivity between assigned addresses. |
NVM Express over RoCEv2 Fabric (NVMe-oF/RoCEv2) |
The transport for RoCEv2 uses theIPv4 user datagram protocol (UDP) which requires a lossless network. The converged Ethernet network switches are required to support industry standard congestion control mechanisms. The guides on how to configure the switches and topologies can be found at these locations :
The MongoDB deployment system will require dedicated Host Bus Adapters (HBA) which support RoCEv2. More information on supported hardware and configurations for NVMe-oF/RoCEv2 can be found in the NVMe-oF Support Matrix. |
NVM Express over Fibre Channel Fabric (FC-NVMe) |
NVMe/FC uses standard Fibre Channel mechanisms for the transport of NVMe commands. More information on supported hardware and configurations for NVMe/FC can be found in the NVMe-oF Support Matrix. |
NVM Express over Transmission Control Protocol (NVMe/TCP) |
NVMe/TCP use standard TCP mechanisms for the transport of NVMe commands. More information on supported hardware and configurations for NVMe/TCP can be found in the NVMe-oF Support Matrix. |
Recommended Configuration for MongoDB on FlashArray
Operating System Requirements
MongoDB can be deployed on a number of platforms. A platform consists of a CPU architecture and operating system. The current supported platforms can be found in the MongoDB documentation.
The relevant operating system and associated recommendations can be found below :
Operating System | Recommendations |
---|---|
Red Hat Enterprise Linux/ CentOS / Oracle Linux/Ubuntu/SUSE/Debian |
The recommended settings for Linux operating systems can be found in the Linux best practices for FlashArray. In addition to the existing best practices ensure that the number of requests per volume is set to a higher value. Increasing this value allows individual volumes to service more storage requests at any single time. Add the following to the udev rules file for FlashArray to increase the maximum number of requests for a single volume: Set DM devices number of requests to 1024 or higher. This uses 1MB of memory per request ACTION=="add|change", KERNEL=="dm-[0-9]*", SUBSYSTEM=="block", ENV{DM_NAME}=="3624a937*", ATTR{queue/nr_requests}="1024" |
Microsoft Windows Server | The best practices for Microsoft Windows can be set using the Test-WindowsBestPractices Cmdlet. |
File System Configuration
The recommended file systems for MongoDB with FlashArray are NTFS (Microsoft Windows) , and XFS (Linux).
Microsoft Windows
The default allocation unit size (4K) for NTFS filesystems works well with MongoDB.
Linux
XFS
When using the XFS file system for any MongoDB database files the default options are typically all that is required.
XFS file systems can be created on a FlashArray volume using the command:
mkfs.xfs /dev/mapper/<device>
The discard and noatime options in /etc/fstab should be used with XFS in the majority of cases.
/dev/mapper/<device> /mountpoint xfs discard,noatime 0 0
If using iSCSI connectivity to FlashArray ensure the _netdev,nofail options are present:
/dev/mapper/<device> /mountpoint xfs _netdev,nofail,discard,noatime 0 0
For systems using the Linux Kernel 4.13 or later the "nobarrier" option is deprecated for XFS. To increase the possible performance of the volume add the following udev rule to the existing rules file:
FCP and iSCSI
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/write_cache}="write through" ACTION=="add|change", KERNEL=="dm-[0-9]*", SUBSYSTEM=="block", ENV{DM_NAME}=="3624a937*", ATTR{queue/write_cache}="write through"
NVMe-oF connectivity
ACTION=="add|change", KERNEL=="nvme*[!0-9]", SUBSYSTEM=="block", ATTR{queue/write_cache}="write through" ACTION=="add|change", KERNEL=="dm-[0-9]*", SUBSYSTEM=="block", ENV{DM_NAME}=="eui.00723ec7b5b427*", ATTR{queue/write_cache}= "write through"
Once the rules have been added reload and apply them using the udevadm utility:
udevadm control --reload-rules && udevadm trigger
Adding an entry to /etc/fstab
Entries in /etc/fstab can be done using the device path or the UUID of the filesystem.
To get the UUID of a file system use the blkid command and match the device to the respective UUID.
[root@DB-01 ~]# blkid /dev/sdb: PTUUID="91360acd-7332-47d9-9027-0300c7e3a081" PTTYPE="gpt" /dev/sda: PTUUID="91360acd-7332-47d9-9027-0300c7e3a081" PTTYPE="gpt" /dev/sdc: PTUUID="91360acd-7332-47d9-9027-0300c7e3a081" PTTYPE="gpt" /dev/mapper/3624a93708488b6dac70f42a20001ec55: PTUUID="91360acd-7332-47d9-9027-0300c7e3a081" PTTYPE="gpt" /dev/sdd: PTUUID="91360acd-7332-47d9-9027-0300c7e3a081" PTTYPE="gpt" /dev/mapper/3624a93708488b6dac70f42a20001ec55p1: UUID="D759-B5A6" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="8f317ff7-97f5-48ab-83d8-f54d647fb390" /dev/mapper/3624a93708488b6dac70f42a20001ec55p2: UUID="97d13054-20c7-436e-b6ab-ef8b8f3ce46b" BLOCK_SIZE="512" TYPE="xfs" PARTUUID="2644a548-dde5-4ac8-8679-c100011a0d78" /dev/mapper/3624a93708488b6dac70f42a20001ec55p3: UUID="GXujto-h3yl-V3hq-jFZs-3xd6-RhLB-0vFf8l" TYPE="LVM2_member" PARTUUID="380fbf39-eb6e-4bf0-ae25-f4abd6f02803" /dev/mapper/rhel-root: UUID="5547b1a3-590a-4d54-9d0b-710714ca7e52" BLOCK_SIZE="512" TYPE="xfs" /dev/mapper/rhel-swap: UUID="70470107-9368-4d32-82d8-7620f1cdc665" TYPE="swap" /dev/mapper/rhel-home: UUID="0e08dae8-3d93-451c-a021-b3fe58f9b464" BLOCK_SIZE="512" TYPE="xfs" /dev/nvme0n1: UUID="36202f29-95e8-49d2-a200-4057712b9236" BLOCK_SIZE="512" TYPE="xfs" /dev/nvme0n2: UUID="489e7195-46b5-466a-8770-9aa7938c8afb" BLOCK_SIZE="512" TYPE="xfs" /dev/mapper/eui.00668f1ab9b15f4b24a937c400011884: UUID="36202f29-95e8-49d2-a200-4057712b9236" BLOCK_SIZE="512" TYPE="xfs" /dev/nvme1n1: UUID="36202f29-95e8-49d2-a200-4057712b9236" BLOCK_SIZE="512" TYPE="xfs" /dev/nvme3n5: UUID="41e4a009-b98a-4d48-9d69-f45e717485e2" BLOCK_SIZE="512" TYPE="xfs" /dev/mapper/eui.00668f1ab9b15f4b24a937c400011890: UUID="489e7195-46b5-466a-8770-9aa7938c8afb" BLOCK_SIZE="512" TYPE="xfs" /dev/mapper/eui.00668f1ab9b15f4b24a937c400011898: UUID="0f4da946-a86e-4fd3-bcd3-0d74d88ef83b" BLOCK_SIZE="512" TYPE="xfs" /dev/mapper/eui.00668f1ab9b15f4b24a937c4000118a0: UUID="9a5e14cc-b226-4f7c-91a5-f8397a73b5bd" BLOCK_SIZE="512" TYPE="xfs" /dev/mapper/eui.00668f1ab9b15f4b24a937c4000118a1: UUID="41e4a009-b98a-4d48-9d69-f45e717485e2" BLOCK_SIZE="512" TYPE="xfs"
Then use the UUID in place of the device path:
UUID=489e7195-46b5-466a-8770-9aa7938c8afb /mountpoint xfs _netdev,nofail,noatime,discard 0 0
Increase file limits
The file limits (ulimit) settings for UNIX platforms may need to be adjusted based on the workload requirements. To do this edit /etc/security/limits.conf and add the following before the end of the file :
mongod soft nproc 64000 mongod hard nproc 64000 mongod soft nofile 64000 mongod hard nofile 64000
Once edited , save the file and reboot the system.
Lower readahead values
MongoDB disk access patterns are generally random and thus does not require filesystem data to be cached for sequental read operations.
Using the blockdev --report command the readahead setting can be found on a per block device basis :
blockdev --report RO RA SSZ BSZ StartSec Size Device rw 8192 512 4096 0 4398046511104 /dev/sdn rw 8192 512 4096 0 824633720832 /dev/sdp rw 8192 512 4096 0 4398046511104 /dev/sdt rw 8192 512 4096 0 824633720832 /dev/sds rw 8192 512 4096 0 4398046511104 /dev/sdq rw 8192 512 4096 0 824633720832 /dev/sdv rw 8192 512 4096 0 53687091200 /dev/sdu rw 8192 512 4096 0 4398046511104 /dev/sdw rw 8192 512 4096 0 53687091200 /dev/sdx rw 8192 512 4096 0 824633720832 /dev/dm-0 rw 8192 512 4096 0 4398046511104 /dev/dm-1
In this example /dev/dm-1 is the multipath device on which the MongoDB data files will reside.
To ensure the readahead values are persistently set for this device a udev rule will be created for it. To find the device name for the udev rule we need to query it using udeadm. Using the command udevadm info <device> the DM_NAME value can be found :
udevadm info /dev/dm-1 P: /devices/virtual/block/dm-1 N: dm-1 L: 50 S: disk/by-id/dm-name-3624a9370f2abf1c1b1c049fd000153b8 S: disk/by-id/dm-uuid-mpath-3624a9370f2abf1c1b1c049fd000153b8 S: disk/by-id/scsi-3624a9370f2abf1c1b1c049fd000153b8 S: disk/by-id/wwn-0x624a9370f2abf1c1b1c049fd000153b8 S: disk/by-uuid/45cdb491-820c-44c6-89f2-2a75ffca0a48 S: mapper/3624a9370f2abf1c1b1c049fd000153b8 E: DEVLINKS=/dev/disk/by-id/wwn-0x624a9370f2abf1c1b1c049fd000153b8 /dev/disk/by-id/dm-uuid-mpath-3624a9370f2abf1c1b1c049fd000153b8 /dev/mapper/3624a9370f2abf1c1b1c049fd000153b8 /dev/disk/by-uuid/45cdb491-820c-44c6-89f2-2a75ffca0a48 /dev/disk/by-id/scsi-3624a9370f2abf1c1b1c049fd000153b8 /dev/disk/by-id/dm-name-3624a9370f2abf1c1b1c049fd000153b8 E: DEVNAME=/dev/dm-1 E: DEVPATH=/devices/virtual/block/dm-1 E: DEVTYPE=disk E: DM_ACTIVATION=0 E: DM_NAME=3624a9370f2abf1c1b1c049fd000153b8 E: DM_SERIAL=3624a9370f2abf1c1b1c049fd000153b8 E: DM_SUBSYSTEM_UDEV_FLAG0=1 E: DM_SUSPENDED=0 E: DM_TYPE=scsi E: DM_UDEV_DISABLE_LIBRARY_FALLBACK_FLAG=1 E: DM_UDEV_PRIMARY_SOURCE_FLAG=1 E: DM_UDEV_RULES_VSN=2 E: DM_UUID=mpath-3624a9370f2abf1c1b1c049fd000153b8 E: DM_WWN=0x624a9370f2abf1c1b1c049fd000153b8 E: ID_FS_TYPE=xfs E: ID_FS_USAGE=filesystem E: ID_FS_UUID=45cdb491-820c-44c6-89f2-2a75ffca0a48 E: ID_FS_UUID_ENC=45cdb491-820c-44c6-89f2-2a75ffca0a48 E: MAJOR=253 E: MINOR=1 E: MPATH_DEVICE_READY=1 E: MPATH_SBIN_PATH=/sbin E: MPATH_UNCHANGED=1 E: SUBSYSTEM=block E: TAGS=:systemd: E: USEC_INITIALIZED=5992667
Once the DM_NAME has been obtained update an existing rules file in /etc/udev/rules.d/ or create a new one with the below entry :
ACTION=="add|change", KERNEL=="dm-[0-9]*", SUBSYSTEM=="block", ENV{DM_NAME}=="3624a9370f2abf1c1b1c049fd000153b8", ATTR{bdi/read_ahead_kb}="32"
To apply the changes execute the following :
udevadm control --reload-rules && udevadm trigger
Once the changes have been aplied to the relevant volumes then blockdev --report should show the RA (Read Ahead) value to be the same as that set in the udev rules.
blockdev --report RO RA SSZ BSZ StartSec Size Device rw 8192 512 4096 0 4398046511104 /dev/sdn rw 8192 512 4096 0 824633720832 /dev/sdp rw 8192 512 4096 0 4398046511104 /dev/sdt rw 8192 512 4096 0 824633720832 /dev/sds rw 8192 512 4096 0 4398046511104 /dev/sdq rw 8192 512 4096 0 824633720832 /dev/sdv rw 8192 512 4096 0 53687091200 /dev/sdu rw 8192 512 4096 0 4398046511104 /dev/sdw rw 8192 512 4096 0 53687091200 /dev/sdx rw 32 512 4096 0 824633720832 /dev/dm-0 rw 32 512 512 0 53687091200 /dev/dm-1
Volume and File System Architectural Layout
MongoDB deployments only require that a data and log directory (textual logging for debug and troubleshooting) be created on each system where the instances will run.
Microsoft Windows
For Microsoft Windows environments this can be specified during the installation process as per the below image. Prior to installation a data and log volume should be created on FlashArray, connected to the MongoDB host , formatted with a filesystem and mounted at the required location. The Data and Log directory can be located anywhere as long as the correct user permissions are set on the drive or mount point.
Linux
When installing MongoDB in Linux it will typically create the log and data directory at the following default locations :
- Data - /var/lib/mongo
- Log /var/log/mongodb
These locations are preset in the MongoDB configuration file under the systemLog and storage areas :
# where to write logging data. systemLog: destination: file logAppend: true path: /var/log/mongodb/mongod.log # where to write storage: dbPath: /var/lib/mongo journal: enabled: true
These locations can be changed as required. Prior to starting a MongoDB instance the relevant data and log volumes should be create on FlashArray , connected to the MongoDB hosts , formatted with a filesystem , mounted with the relevant options and be owned by the appropriate MongoDB user (default is mongod).
MongoDB Configuration file options
MongoDB uses a configuration file to specify different options for the instance to configure during startup. The location of this file can be found at the following locations in each operating system :
- Microsoft Windows - C:\Program Files\MongoDB\Server\5.0\bin/mongod.cfg
- Linux - /etc/mongod.conf
For MongoDB on FlashArray volumes it is recommended to turn off all compression options for the journal, indexes and collections. To turn off compression ensure that the following entires are added for the wiredTiger entries (under storage) of the configuration file :
# Where and how to store data. storage: dbPath: /var/lib/mongo journal: enabled: true wiredTiger: engineConfig: journalCompressor: none indexConfig: prefixCompression: false collectionConfig: blockCompressor: none