Recovery from Data Snapshots without Access to SAP Tools
Recovery from a Data Snapshot typically uses either SAP HANA Studio or SAP HANA Cockpit. In the event that access to these administration tools is not possible, recovery can be performed by following this guide. This process can also be used in scenarios where there is no access to the SAP HANA Backup Catalog or any backups of it.
Requirements for this process are as below:
- Access to a bash terminal or SSH connection for any of the SAP HANA systems required during the recovery (both root and <sid>adm users credentials are required).
- SSH or GUI access to the FlashArray.
Before getting started, if the instance is not shut down - shut it down using the sapcontrol command utility:
Scale Up
/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function Stop
Scale Out
/usr/sap/hostctrl/exe/sapcontrol -nr <instance number>-function StopSystem HDB
In a Scale Out environment, the sapcontrol stop command needs to be run on each node in the landscape.
Wait until the instance is fully stopped by checking the status with the sapcontrol command utility:
Scale Up
/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function GetProcessList
When the instance is fully shut down it should display the output:
07.07.2020 04:24:07 GetProcessList OK name, description, dispstatus, textstatus, starttime, elapsedtime, pid hdbdaemon, HDB Daemon, GRAY, Stopped, , , 20982
Scale Out
/usr/sap/hostctrl/exe/sapcontrol -nr <instance_number> -function GetSystemInstanceList
When the instance is fully shut down it should display the output:
08.07.2020 06:30:48 GetSystemInstanceList OK hostname, instanceNr, httpPort, httpsPort, startPriority, features, dispstatus shn1, 0, 50013, 50014, 0.3, HDB|HDB_WORKER, GRAY shn2, 0, 50013, 50014, 0.3, HDB|HDB_STANDBY, GRAY shn3, 0, 50013, 50014, 0.3, HDB|HDB_WORKER, GRAY shn4, 0, 50013, 50014, 0.3, HDB|HDB_WORKER, GRAY
Step 1. Unmount the SAP HANA Data Volume(s)
Scale Up
To find what volumes are mounted execute the df command in a bash terminal or SSH connection.
df
This should give the location of all mount points and volumes in the filesystem. From the below output it can be seen that the /hana/data volume is where the SAP HANA data volume is mounted to.
Filesystem 1K-blocks Used Available Use% Mounted on devtmpfs 2377876496 0 2377876496 0% /dev tmpfs 3568402432 32 3568402400 1% /dev/shm tmpfs 2377886844 12376 2377874468 1% /run tmpfs 2377886844 0 2377886844 0% /sys/fs/cgroup /dev/mapper/3624a9370c49a4cb0e2944f440002d735-part2 62883840 18675356 44208484 30% / /dev/mapper/3624a9370c49a4cb0e2944f440002dc76 536608768 44161756 492447012 9% /hana/shared tmpfs 475577368 24 475577344 1% /run/user/469 tmpfs 475577368 0 475577368 0% /run/user/468 tmpfs 475577368 0 475577368 0% /run/user/0 /dev/mapper/3624a93701b16eddfb96a4c3800011c30 5366622188 583411380 4783210808 11% /hana/data /dev/mapper/3624a93701b16eddfb96a4c3800011c31 1073217536 550628920 522588616 52% /hana/log fileserver.puredoes.local:/mnt/nfs/HANA_Backup 524032000 142322688 381709312 28% /hana/backup
Once the location has been found, use the umount command to unmount the volume:
umount /hana/data
The /hana/data volume should no longer be shown when executing df once again:
Filesystem 1K-blocks Used Available Use% Mounted on devtmpfs 2377876496 0 2377876496 0% /dev tmpfs 3568402432 32 3568402400 1% /dev/shm tmpfs 2377886844 12376 2377874468 1% /run tmpfs 2377886844 0 2377886844 0% /sys/fs/cgroup /dev/mapper/3624a9370c49a4cb0e2944f440002d735-part2 62883840 18675192 44208648 30% / /dev/mapper/3624a9370c49a4cb0e2944f440002dc76 536608768 44161756 492447012 9% /hana/shared tmpfs 475577368 24 475577344 1% /run/user/469 tmpfs 475577368 0 475577368 0% /run/user/468 tmpfs 475577368 0 475577368 0% /run/user/0 /dev/mapper/3624a93701b16eddfb96a4c3800011c31 1073217536 550628920 522588616 52% /hana/log fileserver.puredoes.local:/mnt/nfs/HANA_Backup 524032000 142345216 381686784 28% /hana/backup
Scale Out
Each worker node in the Scale Out landscape will have an SAP HANA data volume attached to it. The Storage API connector will also automatically connect data volumes to standby nodes in the event of a worker node failing. What this means is that any data volume could be attached to any node at any point in time.
It is recommended to connect to each node, including standby nodes, in the scale out landscape the ascertain where the data volumes are connected to. Running the df command on each node will give the required mount point information.
df
Node 1
From the output of Node 1 it can be seen that the SAP HANA Data volume is mounted at /hana/data/SH1/mnt0001.
Filesystem 1K-blocks Used Available Use% Mounted on devtmpfs 264013600 0 264013600 0% /dev tmpfs 397605660 4 397605656 1% /dev/shm tmpfs 264021864 10432 264011432 1% /run tmpfs 264021864 0 264021864 0% /sys/fs/cgroup /dev/sdy2 62883840 14807788 48076052 24% / fileserver.puredoes.local:/mnt/nfs/SHN_Shared 524032000 153831424 370200576 30% /hana/shared fileserver.puredoes.local:/mnt/nfs/SHN_Backup 524032000 153831424 370200576 30% /hana/backup tmpfs 52804372 24 52804348 1% /run/user/469 tmpfs 52804372 0 52804372 0% /run/user/468 tmpfs 52804372 0 52804372 0% /run/user/0 /dev/mapper/3624a9370884890ea83bd488200011c47 536608768 110732156 425876612 21% /hana/data/SH1/mnt00001 /dev/mapper/3624a9370884890ea83bd488200011c4a 402456576 95926824 306529752 24% /hana/log/SH1/mnt00001
Node 2
From the output of Node 2 it can be seen that the SAP HANA Data volume is mounted at /hana/data/SH1/mnt0002.
Filesystem 1K-blocks Used Available Use% Mounted on devtmpfs 264013600 0 264013600 0% /dev tmpfs 397605660 4 397605656 1% /dev/shm tmpfs 264021864 10436 264011428 1% /run tmpfs 264021864 0 264021864 0% /sys/fs/cgroup /dev/sdy2 62883840 14570096 48313744 24% / Fileserver.puredoes.local:/mnt/nfs/SHN_Backup 524032000 153831424 370200576 30% /hana/backup Fileserver.puredoes.local:/mnt/nfs/SHN_Shared 524032000 153831424 370200576 30% /hana/shared tmpfs 52804372 24 52804348 1% /run/user/469 tmpfs 52804372 0 52804372 0% /run/user/468 tmpfs 52804372 0 52804372 0% /run/user/0 /dev/mapper/3624a9370884890ea83bd488200011c48 536608768 113832192 422776576 22% /hana/data/SH1/mnt00002 /dev/mapper/3624a9370884890ea83bd488200011c4b 402456576 112637136 289819440 28% /hana/log/SH1/mnt00002
Node 3
From the output of Node 3, it can be seen that the SAP HANA Data volume is mounted at /hana/data/SH1/mnt0003.
Filesystem 1K-blocks Used Available Use% Mounted on devtmpfs 264013600 0 264013600 0% /dev tmpfs 397605660 4 397605656 1% /dev/shm tmpfs 264021864 10436 264011428 1% /run tmpfs 264021864 0 264021864 0% /sys/fs/cgroup /dev/sdy2 62883840 14570832 48313008 24% / Fileserver.puredoes.local:/mnt/nfs/SHN_Backup 524032000 153834496 370197504 30% /hana/backup Fileserver.puredoes.local:/mnt/nfs/SHN_Shared 524032000 153834496 370197504 30% /hana/shared tmpfs 52804372 20 52804352 1% /run/user/469 tmpfs 52804372 0 52804372 0% /run/user/468 tmpfs 52804372 0 52804372 0% /run/user/0 /dev/mapper/3624a9370884890ea83bd488200011c49 536608768 113978852 422629916 22% /hana/data/SH1/mnt00003 /dev/mapper/3624a9370884890ea83bd488200011c4c 402456576 112637136 289819440 28% /hana/log/SH1/mnt00003
Node 4
From the output of Node 4, it can be seen that there is no data or log volume, making this a standby node in the scale out landscape.
Filesystem 1K-blocks Used Available Use% Mounted on devtmpfs 264013600 0 264013600 0% /dev tmpfs 397605660 4 397605656 1% /dev/shm tmpfs 264021864 10308 264011556 1% /run tmpfs 264021864 0 264021864 0% /sys/fs/cgroup /dev/sdz2 62883840 14540732 48343108 24% / Fileserver.puredoes.local:/mnt/nfs/SHN_Backup 524032000 153834496 370197504 30% /hana/backup Fileserver.puredoes.local:/mnt/nfs/SHN_Shared 524032000 153834496 370197504 30% /hana/shared tmpfs 52804372 20 52804352 1% /run/user/469 tmpfs 52804372 0 52804372 0% /run/user/468 tmpfs 52804372 0 52804372 0% /run/user/0
Once all of the data volume mount points and node locations have been identified, they need to be unmounted using the umount command.
Node 1
umount /hana/data/SH1/mnt0001
Node 2
umount /hana/data/SH1/mnt0002
Node 3
umount /hana/data/SH1/mnt0002
Step 2. Recover the Snapshot(s)
Copy the Snapshot to a New Volume
Using the FlashArray Web GUI navigate to the Storage view, and under Volumes identify the SAP HANA data volume(s). The volume in the image below is the SAP-HANA-Data volume for a Scale Up instance. Focus on the Volume Snapshots Section and identify the point in time for the recovery, this is done using the "Created" field. Note also that Snapshot names in this scenario are shown with "SAP-HANA-Data-SAPHANA-BackupID". In a normal recovery using the SAP HANA backup catalog, the backup ID can be used to match the recovery point to a volume snapshot.
In the Volume Snapshots view select the 3 ellipses in the upper right hand corner to expose the options. In this scenario, the volume snapshot is going to be copied to a new volume. It is also possible to overwrite the existing volume with the snapshot.
Give the new volume a name. If a volume exists with the same name, then it may be necessary to overwrite it by setting the overwrite option.
Once completed , the volume should not show up in the list of volumes on the FlashArray, this can be seen in Storage, under the Volumes section:
Disconnect the old volume (if the volume has been copied)
Once the volume has been copied, then the original SAP HANA Data volume needs to be disconnected from the host and the new volume for recovery connected to the host. This can be done in Storage, under hosts - find the relevant SAP HANA Host. To disconnect the original volume either select the 3 ellipses in the upper right hand corner and select disconnect volume of "Connect Volumes" or use the "x" next to the volume.
Prompt when selecting the "x":
Selecting the 3 ellipses and using "Disconnect".
Once the original SAP HANA Data volume has been disconnected, the operating system must be informed that the SAP HANA Data volume has been removed. To do so - execute the following to check for and remove missing volumes:
rescan-scsi-bus.sh -r
If the volumes have been removed properly, then at the end of the command executing it will show output similar to this:
16 device(s) removed. [2:0:0:1] [2:0:2:1] [2:0:5:1] [2:0:7:1] [4:0:1:1] [4:0:2:1] [4:0:5:1] [4:0:8:1] [7:0:2:1] [7:0:3:1] [7:0:6:1] [7:0:7:1] [8:0:1:1] [8:0:3:1] [8:0:8:1] [8:0:9:1]
Connect the New Volume (if the Volume has been Copied)
In Storage, under the Hosts view find the host to connect the new SAP HANA Data volume to. Find the section for "Connected Volumes" In the Host properties and select the 3 ellipses in the top right hand corner.
In the option which appears, select "Connect..."
Select the new SAP HANA Data Volume and then select "Connect".
The volume should now be connected. At this point the operating system needs to scan for the new volume. use the following command to scan for any new volumes:
rescan-scsi-bus.sh -a
If the new volume has been found then the output should contain something similar to the below:
16 new or changed device(s) found. [2:0:0:1] [2:0:2:1] [2:0:5:1] [2:0:7:1] [4:0:1:1] [4:0:2:1] [4:0:5:1] [4:0:8:1] [7:0:2:1] [7:0:3:1] [7:0:6:1] [7:0:7:1] [8:0:1:1] [8:0:3:1] [8:0:8:1] [8:0:9:1]
Mount the Data Volume(s)
Scale Up
If the volume has been copied or overwritten during recovery, these steps are all the same for a Scale Up environment.
Identify the SAP HANA Data volume using the following command:
multipath -ll
The output will show a number of devices, but the first line of the device information contains the device serial number (this is the text starting with 3624a9730).
3624a93701b16eddfb96a4c3800011c36 dm-5 PURE,FlashArray size=5.0T features='0' hwhandler='1 alua' wp=rw `-+- policy='queue-length 0' prio=50 status=active |- 2:0:7:1 sds 65:32 active ready running |- 2:0:5:1 sdq 65:0 active ready running |- 2:0:2:1 sdo 8:224 active ready running |- 2:0:0:1 sdm 8:192 active ready running |- 4:0:5:1 sdy 65:128 active ready running |- 4:0:2:1 sdw 65:96 active ready running |- 4:0:1:1 sdu 65:64 active ready running |- 4:0:8:1 sdaa 65:160 active ready running |- 7:0:7:1 sdai 66:32 active ready running |- 7:0:6:1 sdag 66:0 active ready running |- 7:0:3:1 sdae 65:224 active ready running |- 7:0:2:1 sdac 65:192 active ready running |- 8:0:1:1 sdak 66:64 active ready running |- 8:0:9:1 sdaq 66:160 active ready running |- 8:0:8:1 sdao 66:128 active ready running `- 8:0:3:1 sdam 66:96 active ready running
Viewing the volume information the serial number can be matched up to the device.
Once the device has been identified, it needs to be mounted to the SAP HANA data volume location identified in Step 1.
mount /dev/mapper/3624a93701b16eddfb96a4c3800011c36 /hana/data
If the device has mounted successfully then when executing "df" in the command line it should be displayed:
Filesystem 1K-blocks Used Available Use% Mounted on devtmpfs 2377876496 0 2377876496 0% /dev tmpfs 3568402432 32 3568402400 1% /dev/shm tmpfs 2377886844 20568 2377866276 1% /run tmpfs 2377886844 0 2377886844 0% /sys/fs/cgroup /dev/mapper/3624a9370c49a4cb0e2944f440002d735-part2 62883840 18672736 44211104 30% / /dev/mapper/3624a9370c49a4cb0e2944f440002dc76 536608768 44161756 492447012 9% /hana/shared tmpfs 475577368 24 475577344 1% /run/user/469 tmpfs 475577368 0 475577368 0% /run/user/468 tmpfs 475577368 0 475577368 0% /run/user/0 /dev/mapper/3624a93701b16eddfb96a4c3800011c31 1073217536 550628920 522588616 52% /hana/log fileserver.puredoes.local:/mnt/nfs/HANA_Backup 524032000 154021888 370010112 30% /hana/backup /dev/mapper/3624a93701b16eddfb96a4c3800011c36 5366622188 583412144 4783210044 11% /hana/data
Once the recovery is completed it is recommended that the device is set to permanently mount at startup by editing the /etc/fstab entry.
Scale Out
In a scale out scenario, if the volumes have been overwritten no further action is needed.
If the volumes have been copied to new volumes, additional steps are required to inform SAP HANA of the world wide identifiers(WWIDs) have changed.
To update the WWID's for SAP HANA open the following file in a text editor : /hana/shared/<SID>/global/hdb/custom/config/global.ini.
Identify the [storage] section, this contains the WWID'ss, the sections containing "partition_N_data_wwid" need to be updated with the new volume values.
[storage] ha_provider = hdb_ha.fcClient partition_*_*__prtype = 5 partition_1_data__wwid = 3624a9370884890ea83bd488200011c47 partition_1_log__wwid = 3624a9370884890ea83bd488200011c4a partition_2_data__wwid = 3624a9370884890ea83bd488200011c48 partition_2_log__wwid = 3624a9370884890ea83bd488200011c4b partition_3_data__wwid = 3624a9370884890ea83bd488200011c49 partition_3_log__wwid = 3624a9370884890ea83bd488200011c4c
To get the new volume values execute the following:
mutliapth -ll
The WWID for each volume is the first identifier for each device-mapper device. These values need to be taken to update the global.ini file.
3624a9370884890ea83bd488200011c61 dm-3 PURE,FlashArray size=512G features='0' hwhandler='1 alua' wp=rw `-+- policy='queue-length 0' prio=50 status=active |- 0:0:0:252 sdd 8:48 active ready running |- 0:0:1:252 sdj 8:144 active ready running |- 0:0:4:252 sdp 8:240 active ready running |- 0:0:7:252 sdv 65:80 active ready running |- 1:0:4:252 sdaf 65:240 active ready running |- 1:0:5:252 sdal 66:80 active ready running |- 1:0:6:252 sdar 66:176 active ready running `- 1:0:7:252 sdax 67:16 active ready running 3624a9370884890ea83bd488200011c60 dm-4 PURE,FlashArray size=512G features='0' hwhandler='1 alua' wp=rw `-+- policy='queue-length 0' prio=50 status=active |- 0:0:0:253 sde 8:64 active ready running |- 0:0:1:253 sdk 8:160 active ready running |- 0:0:4:253 sdq 65:0 active ready running |- 0:0:7:253 sdw 65:96 active ready running |- 1:0:4:253 sdag 66:0 active ready running |- 1:0:5:253 sdam 66:96 active ready running |- 1:0:6:253 sdas 66:192 active ready running `- 1:0:7:253 sday 67:32 active ready running 3624a9370884890ea83bd488200011c5f dm-5 PURE,FlashArray size=512G features='0' hwhandler='1 alua' wp=rw `-+- policy='queue-length 0' prio=50 status=active |- 0:0:0:254 sdf 8:80 active ready running |- 0:0:1:254 sdl 8:176 active ready running |- 0:0:4:254 sdr 65:16 active ready running |- 0:0:7:254 sdx 65:112 active ready running |- 1:0:4:254 sdah 66:16 active ready running |- 1:0:5:254 sdan 66:112 active ready running |- 1:0:6:254 sdat 66:208 active ready running `- 1:0:7:254 sdaz 67:48 active ready running
Once the global.ini file has been updated it should have the new WWID values:
[storage] ha_provider = hdb_ha.fcClient partition_*_*__prtype = 5 partition_1_data__wwid = 3624a9370884890ea83bd488200011c61 partition_1_log__wwid = 3624a9370884890ea83bd488200011c4a partition_2_data__wwid = 3624a9370884890ea83bd488200011c60 partition_2_log__wwid = 3624a9370884890ea83bd488200011c4b partition_3_data__wwid = 3624a9370884890ea83bd488200011c5f partition_3_log__wwid = 3624a9370884890ea83bd488200011c4c
Once the global.ini file has been updated, if the volumes have been copied, then the master nameserver needs to be identified and the SAP HANA instance on it started.
View the text of the following file: /hana/shared/<SID>/global/hdb/custom/config/nameserver.ini. Identify the [landscape] section and find the "active_master" property. the value in this section is the node that the instance gets started on.
[landscape] id = d65820d1-16fd-014d-a076-140d78db92ca master = shn1:30001 shn2:30001 shn4:30001 worker = shn1 shn2 shn3 active_master = shn1:30001 standby = shn4 roles_shn4 = standby roles_shn2 = worker roles_shn3 = worker roles_shn1 = worker workergroups_shn4 = default failovergroup_shn4 = default workergroups_shn2 = default failovergroup_shn2 = default workergroups_shn3 = default failovergroup_shn3 = default workergroups_shn1 = default failovergroup_shn1 = default
Further recovery must take place on the active_master nameserver. All commands executed for the Scale Out environment needs to be done from the master_nameserver.
Step 3. Force Recovery from the Command line of the SystemDB
In the event of no SAP HANA administration tools being available SAP provides a python script called "recoverSys.py" which can be used to force system recovery. It is triggered using the "HDBSettings.sh" shell script.
The recoverSys.py and HDBSettings.sh scripts can only be executed by the <sid>adm user.
When logged in as the <sid>adm user execute the following (do this on the master_nameserver for Scale Out landscapes):
/usr/sap/<SID>/HDB<Instance Number>/HDBSettings.sh /usr/sap/<SID>/HDB<Iinstance number>/exe/python_support/recoverSys.py --command="RECOVER DATA USING SNAPSHOT CLEAR LOG"
Scale Up Example and output:
/usr/sap/SH1/HDB00/HDBSettings.sh /usr/sap/SH1/HDB00/exe/python_support/recoverSys.py --command "RECOVER DATA USING SNAPSHOT CLEAR LOG" [139672890110912, 0.001] >> starting recoverSys (at Tue Jul 7 07:18:52 2020) [139672890110912, 0.001] args: () [139672890110912, 0.001] keys: {'command': 'RECOVER DATA USING SNAPSHOT CLEAR LOG'} using logfile /usr/sap/SH1/HDB00/hannah/trace/backup.log recoverSys started: ============2020-07-07 07:18:52 ============ testing master: hannah hannah is master shutdown database, timeout is 120 stop system stop system on: hannah stopping system: 2020-07-07 07:18:52 stopped system: 2020-07-07 07:18:52 creating file recoverInstance.sql restart database restart master nameserver: 2020-07-07 07:18:57 start system: hannah sapcontrol parameter: ['-function', 'Start'] sapcontrol returned successfully: 2020-07-07T07:19:20-07:00 P0100598 17329a4f28e INFO RECOVERY RECOVER DATA finished successfully recoverSys finished successfully: 2020-07-07 07:19:21 [139672890110912, 29.380] 0 [139672890110912, 29.380] << ending recoverSys, rc = 0 (RC_TEST_OK), after 29.379 secs
Scale Out Example and output:
sh1adm@SHN1:/usr/sap/SH1/HDB00> /usr/sap/SH1/HDB00/HDBSettings.sh /usr/sap/SH1/HDB00/exe/python_support/recoverSys.py --command "RECOVER DATA USING SNAPSHOT CLEAR LOG" [140479352415168, 0.002] >> starting recoverSys (at Tue Jul 7 08:39:36 2020) [140479352415168, 0.002] args: () [140479352415168, 0.002] keys: {'command': 'RECOVER DATA USING SNAPSHOT CLEAR LOG'} using logfile /usr/sap/SH1/HDB00/shn1/trace/backup.log recoverSys started: ============2020-07-07 08:39:36 ============ testing master: shn1 shn1 is master shutdown database, timeout is 120 stop system stop system on: shn1 stop system on: shn4 stop system on: shn2 stop system on: shn3 stopping system: 2020-07-07 08:39:37 stopped system: 2020-07-07 08:39:41 creating file recoverInstance.sql restart database restart master nameserver: 2020-07-07 08:39:46 start system: shn1 sapcontrol parameter: ['-function', 'Start'] sapcontrol returned successfully: 2020-07-07T08:40:55-07:00 P0044462 17329ef9b2d INFO RECOVERY RECOVER DATA finished successfully starting: 2020-07-07 08:40:55 start system: shn4 start system: shn2 start system: shn3 recoverSys finished successfully: 2020-07-07 08:40:57 [140479352415168, 80.634] 0 [140479352415168, 80.634] << ending recoverSys, rc = 0 (RC_TEST_OK), after 80.632 secs
Step 4. Retrieve the List of Tenants and Recover each one
While still logged in as the <sid>adm user, execute the following query on the SystemDB using the hdbsql utility:
hdbsql -i 00 -n localhost:30013 -u SYSTEM -p <password> "SELECT DATABASE_NAME FROM M_DATABASES WHERE ACTIVE_STATUS = 'NO'"
The results returned are each of the tenant databases which have been invalidated during the recovery process of the SystemDB.
DATABASE_NAME "SH1" "SH2" "SH3" "SH4"
To recover the tenants execute the following using the hdbsql utility (needs to be run separately for each tenant):
hdbsql -i 00 -n localhost:30013 -u SYSTEM -p <password> "RECOVER DATA FOR <Tenant> USING SNAPSHOT CLEAR LOG"