Crash Consistent SAP HANA Snapshots on FlashArray
It is possible to create crash consistent SAP HANA storage snapshots on FlashArray when using a protection group to contain all of the data and log volumes for the instance. A crash consistent storage snapshot is not application consistent therefore point in time recovery cannot be guaranteed. Creating a crash consistent snapshot of an SAP HANA instances volumes can be used as a very fast and non-intrusive method of copying an SAP HANA system and all of its tenants to another location for test, development, and quality assurance purposes.
Creating a Crash Consistent Storage Snapshot
A FlashArray Protection Group is an administrative container for volumes which will be protected together. Volumes are added to a protection group by adding individual volumes, hosts or a host groups (volumes attached to hosts or host groups are added to the protection group by association). Protection group snapshots create a snapshot of each volume apart of it, creating a consistency group - where all the volume snapshots in the group will be consistent with one another to that point in time.
SAP HANA Crash Consistent storage snapshots need to be created as apart of a protection group to ensure the log and data volumes are consistent with one another.
To create a protection group, in the FlashArray GUI navigate to the Protection section and select "Protection groups". To create a Protection group on the local system select the "+" in the top right hand corner of "Source Protection Groups".
When the "Create Protection Group" prompt is shown give it a name.
Once the Protection Group has been created navigate towards it and Add either the SAP HANA Hosts, Host Groups or Volumes.
Scale Up
A Scale Up SAP HANA system will typically have a single log and data volume (assuming this is a single instance system).
Add the Log and Data volume to the protection group.
Once the volumes, hosts or host groups have been added to the protection group, snapshots can be created by selecting the "+" in the top right hand corner of the "Protection Group Snapshots" section.
Give the protection group snapshot a name and choose to apply any further policies to it such as a retention policy or to have the snapshot asynchronously replicate to a target.
Once the snapshot has been created it will show up in the "Protection Group Snapshots" section.
Selecting the Protection Group Snapshot will show which snapshots were created as apart of it.
Scale Out
In a Scale Out environment there will be multiple data and log volumes, as each worker node will have its own log and data volume.
Once the volumes, hosts or host groups have been added to the protection group, snapshots can be created by selecting the "+" in the top right hand corner of the "Protection Group Snapshots" section.
Give the protection group snapshot a name and choose to apply any further policies to it such as a retention policy or to have the snapshot asynchronously replicate to a target.
Once the snapshot has been created it will show up in the "Protection Group Snapshots" section.
Selecting the Protection Group Snapshot will show which snapshots were created as apart of it.
Recovering from a Crash Consistent Storage Snapshot
Recovering from a crash consistent snapshot is not the same as recovering from an application consistent snapshot. There are a number of scenarios to consider:
- Recovering a crash consistent storage snapshot (both data and log) to the same system it was taken from should typically work, where the instance and any tenants will be available after starting up.
- Recovering a crash consistent storage snapshot to a system with a different name or instance SID is possible but SAP HANA needs to be installed beforehand, and any log and data volumes detached before attaching the recovered volumes.
- (Scale Out only) the source and target topology - the number of worker and standby nodes - must match.
- If the original volumes are overwritten with the storage snapshot they must first be unmounted from the operating system.
- (Scale Out only) If the snapshots are restored to new volumes then the global.ini configuration files storage section needs to be updated with the new World Wide Identifiers (WWID's).
Scale Up - Recovering Crash Consistent Storage Snapshots to a different system
In the protection group snapshot select "Copy Snapshot" for each volume being attached to the new system. Both the Log and Data volumes must be copied.
Give each new volume to be copied from a snapshot a name.
Once the volume(s) have been copied they will show up in the Storage View under Volumes.
Connect the new volumes to a host.
When the volumes have been connected, scan for them using the rescan-0scsi-bus.sh utility.
rescan-scsi-bus.sh -a
If the devices have been found then the utility will show output similar to the following image:
32 new or changed device(s) found. [6:0:2:1] [6:0:2:2] [6:0:5:1] [6:0:5:2] [6:0:7:1] [6:0:7:2] [6:0:9:1] [6:0:9:2] [7:0:2:1] [7:0:2:2] [7:0:4:1] [7:0:4:2] [7:0:5:1] [7:0:5:2] [7:0:7:1] [7:0:7:2] [10:0:2:1] [10:0:2:2] [10:0:6:1] [10:0:6:2] [10:0:7:1] [10:0:7:2] [10:0:9:1] [10:0:9:2] [11:0:0:1] [11:0:0:2] [11:0:3:1] [11:0:3:2] [11:0:5:1] [11:0:5:2] [11:0:7:1] [11:0:7:2]
If the "multipath -ll" command is used then the new volumes should show up in device-mapper-multipath.
Mount the new volumes to the relevant location.
mount /dev/mapper/3624a93701b16eddfb96a4c3800011c3c /hana/log mount /dev/mapper/3624a93701b16eddfb96a4c3800011c3c /hana/log
Once mounted the volumes will show up at the relevant locations for SAP HANA.
Filesystem Size Used Avail Use% Mounted on devtmpfs 1.5T 0 1.5T 0% /dev tmpfs 2.2T 32K 2.2T 1% /dev/shm tmpfs 1.5T 13M 1.5T 1% /run tmpfs 1.5T 0 1.5T 0% /sys/fs/cgroup /dev/mapper/3624a9370c49a4cb0e2944f4400038775-part2 60G 17G 44G 28% / /dev/mapper/3624a9370c49a4cb0e2944f4400031d31 512G 43G 470G 9% /hana/shared fileserver.puredoes.local:/mnt/nfs/HANA_Backup 1.0T 153G 872G 15% /hana/backup tmpfs 290G 20K 290G 1% /run/user/469 tmpfs 290G 0 290G 0% /run/user/468 tmpfs 290G 0 290G 0% /run/user/1001 tmpfs 290G 0 290G 0% /run/user/0 /dev/mapper/3624a93701b16eddfb96a4c3800011c3b 5.0T 98G 5.0T 2% /hana/data /dev/mapper/3624a93701b16eddfb96a4c3800011c3c 1.0T 73G 952G 8% /hana/log
Once the volumes have been mounted, SAP HANA needs to be informed that the system name has changed. Using the hdbnsutil utility as the <sid>adm user while the instance is completely shutdown will accomplish this.
hdbnsutil -convertTopology
If everything is as expected the output will be similar to the below, showing that the system name has been changed from the source to the target.
nameserver sarah:30001 not responding. Opening persistence ... sh1adm: no process found hdbrsutil: no process found run as transaction master converting topology from cloned instance... - keeping instance 00 - changing host hannah to sarah done.
The instance can be started at this point. Any new or additional tenants will be automatically picked up from the data volume.
/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function Start
To view if the Scale Up SAP HANA system has been brought up as expected, use the sapcontrol utility.
/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function GetProcessList
If the instance has been started correctly then all of the processes should show the state "Green".
08.07.2020 05:39:38 GetProcessList OK name, description, dispstatus, textstatus, starttime, elapsedtime, pid hdbdaemon, HDB Daemon, GREEN, Running, 2020 07 08 05:38:58, 0:00:40, 33859 hdbcompileserver, HDB Compileserver, GREEN, Running, 2020 07 08 05:39:05, 0:00:33, 34158 hdbindexserver, HDB Indexserver-SH1, GREEN, Running, 2020 07 08 05:39:05, 0:00:33, 34212 hdbnameserver, HDB Nameserver, GREEN, Running, 2020 07 08 05:38:58, 0:00:40, 33877 hdbpreprocessor, HDB Preprocessor, GREEN, Running, 2020 07 08 05:39:05, 0:00:33, 34161 hdbwebdispatcher, HDB Web Dispatcher, GREEN, Running, 2020 07 08 05:39:17, 0:00:21, 41740 hdbindexserver, HDB Indexserver-SH2, GREEN, Running, 2020 07 08 05:39:05, 0:00:33, 34215 hdbindexserver, HDB Indexserver-SH3, GREEN, Running, 2020 07 08 05:39:05, 0:00:33, 34218 hdbindexserver, HDB Indexserver-SH4, GREEN, Running, 2020 07 08 05:39:05, 0:00:33, 34221
Scale Out
In the protection group snapshot select "Copy Snapshot" for each volume being attached to the new system. All of the log and data volumes for the SAP HANA Scale Out deployment must be copied.
Give each new volume an appropriate name.
Connect the volumes to the SAP HANA Scale Out deployment(s) host group.
On all SAP HANA nodes in the Scale Out deployment, scan for the volumes.
rescan-scsi-bus-a
If the volumes have been found then the output should be as follows:
48 new or changed device(s) found. [0:0:0:249] [0:0:0:250] [0:0:0:251] [0:0:0:252] [0:0:0:253] [0:0:0:254] [0:0:1:249] [0:0:1:250] [0:0:1:251] [0:0:1:252] [0:0:1:253] [0:0:1:254] [0:0:4:249] [0:0:4:250]
Executing "multipath -ll" will show the devices in the below form. Take particular note of the first line of the device, noting the text starting with "3624a370". this will be needed for each of the new volumes at a later point.
3624a9370884890ea83bd488200011c82 dm-3 PURE,FlashArray size=512G features='0' hwhandler='1 alua' wp=rw `-+- policy='queue-length 0' prio=50 status=active |- 0:0:0:254 sdf 8:80 active ready running |- 0:0:7:254 sdx 65:112 active ready running |- 0:0:4:254 sdr 65:16 active ready running |- 0:0:2:254 sdl 8:176 active ready running |- 1:0:4:254 sdah 66:16 active ready running |- 1:0:9:254 sdaz 67:48 active ready running |- 1:0:6:254 sdat 66:208 active ready running `- 1:0:5:254 sdan 66:112 active ready running
The SAP HANA Storage API connector is going to use the world wide identifiers (WWID's) to ensure the correct volumes are attached to the appropriate hosts. The record of the WWIDs is found in the /hana/shared/<SID>/global/hdb/custom/config/global.ini configuration file.
It may be necessary to compare the source systems global.ini configuration file as the volumes will need to match the partition number given to the disk type.
Source System global.ini:
[storage] ha_provider = hdb_ha.fcClient partition_*_*__prtype = 5 partition_1_data__wwid = 3624a9370884890ea83bd488200011c47 partition_1_log__wwid = 3624a9370884890ea83bd488200011c4a partition_2_data__wwid = 3624a9370884890ea83bd488200011c48 partition_2_log__wwid = 3624a9370884890ea83bd488200011c4b partition_3_data__wwid = 3624a9370884890ea83bd488200011c49 partition_3_log__wwid = 3624a9370884890ea83bd488200011c4c
Source system volumes (using purevol list) and serial numbers:
Name Size Source Created Serial SHN-HANA-Data01 512G - 2020-07-07 08:20:41 PDT 884890EA83BD488200011C49 SHN-HANA-Data02 512G - 2020-07-07 08:20:41 PDT 884890EA83BD488200011C48 SHN-HANA-Data03 512G - 2020-07-07 08:20:40 PDT 884890EA83BD488200011C47 SHN-HANA-Log01 384G - 2020-07-07 04:28:37 PDT 884890EA83BD488200011C4C SHN-HANA-Log02 384G - 2020-07-07 04:28:37 PDT 884890EA83BD488200011C4B SHN-HANA-Log03 384G - 2020-07-07 04:28:37 PDT 884890EA83BD488200011C4A
Target systems global.ini:
[storage] ha_provider = hdb_ha.fcClient partition_*_*__prtype = 5 partition_1_data__wwid = 3624a9370884890ea83bd488200011c82 partition_1_log__wwid = 3624a9370884890ea83bd488200011c85 partition_2_data__wwid = 3624a9370884890ea83bd488200011c87 partition_2_log__wwid = 3624a9370884890ea83bd488200011c84 partition_3_data__wwid = 3624a9370884890ea83bd488200011c86 partition_3_log__wwid = 3624a9370884890ea83bd488200011c83
Target system volumes (using purevol list) and serial numbers:
Name Size Source Created Serial SAP-HANA-Data01-Copy 512G SHN-HANA-Data01 2020-07-08 06:11:16 PDT 884890EA83BD488200011C86 SHN-HANA-Data02-Copy 512G SHN-HANA-Data02 2020-07-08 06:11:16 PDT 884890EA83BD488200011C87 SHN-HANA-Data03-Copy 512G SHN-HANA-Data03 2020-07-08 06:11:16 PDT 884890EA83BD488200011C82 SHN-HANA-Log01-Copy 384G SHN-HANA-Log01 2020-07-08 06:11:16 PDT 884890EA83BD488200011C83 SHN-HANA-Log02-Copy 384G SHN-HANA-Log02 2020-07-08 06:11:16 PDT 884890EA83BD488200011C84 SHN-HANA-Log03-Copy 384G SHN-HANA-Log03 2020-07-08 06:11:16 PDT 884890EA83BD488200011C85
Note that the topology matching of the source and target Scale Out system follows the below logic:
SHN1 -> SHN5 SHN2 -> SHN6 SHN3 -> SHN7 SHN4 -> SHN8
Once the global.ini file has been updated, login as the <sid>adm user and use the hdbnsutil utility to check for the system name changes and update the topology.
hdbnsutil -convertTopology
If all of the prerequisites are in place, in the event of a successful topology conversion the output will look like the below:
sh1adm@SHN5:/usr/sap/SH1/HDB00> hdbnsutil -convertTopology nameserver shn5:30001 not responding. checking 1 master lock file(s) ........................................ ok load(/usr/sap/SH1/HDB00/exe/python_support/hdb_ha/fcClient.py)=1 attached device '/dev/mapper/3624a9370884890ea83bd488200011c82' to path '/hana/data/SH1/mnt00001' attached device '/dev/mapper/3624a9370884890ea83bd488200011c85' to path '/hana/log/SH1/mnt00001' Opening persistence ... sh1adm: no process found run as transaction master converting topology from cloned instance... - keeping instance 00 - changing host shn1 to shn5 - changing host shn2 to shn6 - changing host shn3 to shn7 - changing host shn4 to shn8 detached device '/dev/mapper/3624a9370884890ea83bd488200011c82' from path '/hana/data/SH1/mnt00001' detached device '/dev/mapper/3624a9370884890ea83bd488200011c85' from path '/hana/log/SH1/mnt00001' done.
At this point the Scale Out system can be started. This is done using the sapcontrol utility.
/usr/sap/hostctrl/exe/sapcontrol -nr <instance number>-function StartSystem HDB
To check what the status of the system is use the sapcontrol utility with the GetSystemInstanceList function.
/usr/sap/hostctrl/exe/sapcontrol -nr 00 -function GetSystemInstanceList
When the system has been successfully started all of the nodes should be shown to be "Green".
hostname, instanceNr, httpPort, httpsPort, startPriority, features, dispstatus shn5, 0, 50013, 50014, 0.3, HDB|HDB_WORKER, GREEN shn8, 0, 50013, 50014, 0.3, HDB|HDB_STANDBY, GREEN shn6, 0, 50013, 50014, 0.3, HDB|HDB_WORKER, GREEN shn7, 0, 50013, 50014, 0.3, HDB|HDB_WORKER, GREEN