SAP HANA Host Failover with ActiveCluster and ActiveDR
In a disaster scenario, there are a number of failure points to consider. ActiveCluster aims to provide continuous access to storage for any SAP HANA systems connected to FlashArray but there is always a risk of losing access to the computer infrastructure and requiring a business continuity solution to handle that scenario. ActiveDR aims to provide near 0 recovery point objectives by offering near synchronous replicationof storage volumes, without impacting key performance indicators, but requires manual intervention to bring a remote SAP HANA system up in the event of failure. Asynchronous replication is the replication of volume snapshots to a different array at regular intervals.
The purpose of this guide is to explain how SAP HANA and ActiveCluster interoperability can be deployed and provide availability for both storage and compute landscape objects.
Manual System Failover For ActiveCluster
The SAP HANA environment is configured as follows:
|
![]() |
|
![]() |
Scale Up Configuration
Initial Configuration
- Install the SAP HANA Scale Up system on both compute 1 and compute 2.
- Use the same instance SID for both installations.
- Use the same <sid>adm user password on both systems.
- Ensure the data and log volumes are mounted to the same location.
- Configure the data, catalog, and log streaming backup location on both SAP HANA systems.
- If using backint ensure that both systems have access to the same backup set in the ISV.
- If using a filesystem based backup ensure that either a shared filesystem or NFS mount point is used, ensure that any subfolders in the backup location are owned by the <sid>adm user.
- Stop the SAP HANA instances on all systems by running the following command on each of them:
/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function Stop
4. Wait until the instances have stopped on each system by using the following command:
/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function GetProcessList
5. Unmount, Disconnect and Delete any SAP HANA Log and Data volumes from compute 2 (and/or 3). The only remaining data and log volumes should be those from Compute 1.
6. Connect the arrays using synchronous replication, optionally add a third array for asynchronous replication.
7. On Array 1, In Protection->ActiveCluster Create a new POD, add the volumes to the POD and then add in Array 2 to stretch the pod. At this point the volumes will begin initial synchronization. Once Initial synchronization is complete both Arrays should as online.
8. At this point the volumes are online on both arrays, connect the host for compute 1 and compute 2 to the volumes in the POD on array 1 and array 2.
9. Ensure that the correct configuration for ActiveCluster DM-Multipath has been done on all nodes - find the configuration in the ActiveCluster Requirements and Best Practices.
10. If required, set the preferred paths for each host in FlashArray.
11. Start the SAP HANA Instance on Compute 1 using the following command:
/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function Stop
12. If required for third site availability add the POD to a Protection Group with Array 3 as the target.
Failover Process
In the event of losing a single array (array 1 or 2) there will be no impact on application availability as failover is transparent.
In the event of losing the compute upon which the SAP HANA instance is running at that point in time - the below process will need to be followed to bring the SAP HANA instance up on alternative compute.
If restoring service to compute 1 or compute 2 (systems attached to the synchronous replication volumes) then the volumes will already exist on the array and should be connected to the relevant host. If the SAP HANA instance is being failed over to a third site replicated to using asynchronous replication then the snapshots must first be restored to volumes and connected to the relevant host(s).
Step 1. Ensure the log and data volumes are not mounted anywhere else.
Use the "umount" command in a terminal or ssh connection to unmount any mounted volumes.
umount /hana/data umount /hana/log
Step 2. Mount the log and data volume to the system being failed over to. This assumes the volumes were already connected previously.
To identify the volumes use the multipath -ll command:
multipath -ll
3624a93701b16eddfb96a4c3800011c31 dm-5 PURE,FlashArray size=1.0T features='0' hwhandler='1 alua' wp=rw |-+- policy='queue-length 0' prio=50 status=active | |- 6:0:5:2 sdp 8:240 active ready running | |- 6:0:3:2 sdl 8:176 active ready running | |- 6:0:9:2 sdx 65:112 active ready running | |- 6:0:7:2 sdt 65:48 active ready running `-+- policy='queue-length 0' prio=10 status=enabled |- 6:0:0:2 sdj 8:144 active ready running |- 6:0:4:2 sdn 8:208 active ready running |- 6:0:6:2 sdr 65:16 active ready running |- 6:0:8:2 sdv 65:80 active ready running 3624a93701b16eddfb96a4c3800011c30 dm-6 PURE,FlashArray size=5.0T features='0' hwhandler='1 alua' wp=rw |-+- policy='queue-length 0' prio=50 status=active | |- 6:0:7:1 sds 65:32 active ready running | |- 6:0:3:1 sdk 8:160 active ready running | |- 6:0:5:1 sdo 8:224 active ready running | |- 6:0:9:1 sdw 65:96 active ready running `-+- policy='queue-length 0' prio=10 status=enabled |- 6:0:8:1 sdu 65:64 active ready running |- 6:0:0:1 sdi 8:128 active ready running |- 6:0:4:1 sdm 8:192 active ready running |- 6:0:6:1 sdq 65:0 active ready running
Using the above wwids (the text at the beginning of each device starting with 3624a9370) the data and log volume can be mounted to the standby SAP HANA system:
mount /dev/mapper/3624a93701b16eddfb96a4c3800011c30 /hana/data mount/dev/mapper/3624a93701b16eddfb96a4c3800011c31 /hana/log
Verify the volumes are mounted correctly by using the "df" command:
df -h
Filesystem Size Used Avail Use% Mounted on devtmpfs 1.5T 0 1.5T 0% /dev tmpfs 2.2T 32K 2.2T 1% /dev/shm tmpfs 1.5T 13M 1.5T 1% /run tmpfs 1.5T 0 1.5T 0% /sys/fs/cgroup /dev/mapper/3624a9370c49a4cb0e2944f4400038775-part2 60G 17G 44G 28% / /dev/mapper/3624a9370c49a4cb0e2944f4400031d31 512G 43G 470G 9% /hana/shared fileserver.puredoes.local:/mnt/nfs/HANA_Backup 1.0T 153G 872G 15% /hana/backup tmpfs 290G 20K 290G 1% /run/user/469 tmpfs 290G 0 290G 0% /run/user/468 tmpfs 290G 0 290G 0% /run/user/1001 tmpfs 290G 0 290G 0% /run/user/0 /dev/mapper/3624a93701b16eddfb96a4c3800011c30 5.0T 625G 4.4T 13% /hana/data /dev/mapper/3624a93701b16eddfb96a4c3800011c31 1.0T 362G 663G 36% /hana/log
Step 3. As the <sid>adm user use the hdbnsutil to change the system name.
hdbnsutil -convertTopology
nameserver sarah:30001 not responding. Opening persistence ... sh1adm: no process found hdbrsutil: no process found run as transaction master converting topology from cloned instance... - keeping instance 00 - changing host hannah to sarah done.
Step 4. Start the SAP HANA system.
Use the sapcontrol utility to start the SAP HANA system:
/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function Start
A successful start request will respond as follows:
09.07.2020 08:46:57 Start OK
To check on the status of the startup process for SAP HANA use the sapcontrol command with the GetProcessList function:
/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function GetProcessList
Once each process has been started, each should be shown as "GREEN".
09.07.2020 08:47:39 GetProcessList OK name, description, dispstatus, textstatus, starttime, elapsedtime, pid hdbdaemon, HDB Daemon, GREEN, Running, 2020 07 09 08:46:58, 0:00:41, 102275 hdbcompileserver, HDB Compileserver, GREEN, Running, 2020 07 09 08:47:05, 0:00:34, 102544 hdbindexserver, HDB Indexserver-SH1, GREEN, Running, 2020 07 09 08:47:05, 0:00:34, 102599 hdbnameserver, HDB Nameserver, GREEN, Running, 2020 07 09 08:46:58, 0:00:41, 102293 hdbpreprocessor, HDB Preprocessor, GREEN, Running, 2020 07 09 08:47:05, 0:00:34, 102547 hdbwebdispatcher, HDB Web Dispatcher, GREEN, Running, 2020 07 09 08:47:21, 0:00:18, 114411 hdbindexserver, HDB Indexserver-SH2, GREEN, Running, 2020 07 09 08:47:05, 0:00:34, 102602 hdbindexserver, HDB Indexserver-SH3, GREEN, Running, 2020 07 09 08:47:05, 0:00:34, 102605 hdbindexserver, HDB Indexserver-SH4, GREEN, Running, 2020 07 09 08:47:05, 0:00:34, 102608
Scale Out Configuration
Initial Configuration
- Install the SAP HANA Scale Out system on both compute 1 and compute 2 (note compute 1 and compute 2 in this instance are separate and distinct groups of servers which match one another. If compute 1 is made up of 4 nodes (3 workers and 1 standby) then compute 2 needs to be made up of the same number of nodes and topology.
- Use the same instance SID for both installations.
- Use the same <sid>adm user password on both sets of systems.
- The shared location (typically and NFS mount point) for the source and target systems are completely separate.
- The global.ini file found at the following location - /hana/shared/<sid>/global/hdb/custom/config/global.ini - is the exact same on both sets of hosts.
- Configure the data, catalog and log streaming backup location on both SAP HANA systems.
- If using backint ensure that both sets of systems have access to the same backup set in the ISV.
- If using a filesystem based backup ensure that either a shared filesystem or NFS mount point is used, ensure that any subfolders in the backup location are owned by the <sid>adm user.
- Stop the SAP HANA instances on both sets of systems by running the following command on each of them:
/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function StopSystem HDB
4. Wait until the instances have stopped on each system by using the following command:
/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function GetSystemInstanceList
When all of the instances have stopped the output will be shown as below:
13.07.2020 03:16:42 GetSystemInstanceList OK hostname, instanceNr, httpPort, httpsPort, startPriority, features, dispstatus shn1, 0, 50013, 50014, 0.3, HDB|HDB_WORKER, GRAY shn4, 0, 50013, 50014, 0.3, HDB|HDB_STANDBY, GRAY shn3, 0, 50013, 50014, 0.3, HDB|HDB_WORKER, GRAY shn2, 0, 50013, 50014, 0.3, HDB|HDB_WORKER, GRAY
5. SAP HANA will unmount the log and data volumes when the systems are stopped, but sometimes this does not happen. It is worth checking if any volumes are still mounted on any nodes at this point. During installation, it is possible to either use a different set of volumes for each scale out landscape or the same volumes. If the former option is used then before going live with this solution one set of the volumes must be chosen, and the alternatives destroyed. Note that if using the latter method (the same volumes on both the source and target) then during installation it needs to be ensured that the original SAP HANA Scale Out cluster must be shut down and the volumes unmounted.
If using the alternative volumes method to install SAP HANA in the scale out the landscape, after deleting the alternative volumes the global.ini file on the source and target set of systems must be updated to reflect the WWID's of the permanent volumes to be kept.
6. Connect the arrays using synchronous replication, optionally add a third array for asynchronous replication.
7. On Array 1, In Protection->ActiveCluster Create a new POD, add the volumes to the POD and then add in Array 2 to stretch the pod. At this point the volumes will begin initial synchronization. Once Initial synchronization is complete both Arrays should as online.
8. At this point the volumes are online on both arrays, connect the host for compute 1 and compute 2 to the volumes in the POD on array 1 and array 2.
9. Ensure that the correct configuration for ActiveCluster DM-Multipath has been done on all nodes - find the configuration in the ActiveCluster Requirements and Best Practices.
10. If required, set the preferred paths for each host in FlashArray.
11. Start the SAP HANA Instance on Compute 1 using the following command:
/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function StartSystem HDB
12. If required for third site availability add the POD to a Protection Group with Array 3 as the target.
Failover Process
In the event of losing a single array (array 1 or 2) there will be no impact on application availability as failover is transparent.
In the event of losing the compute upon which the SAP HANA instance is running at that point in time - the below process will need to be followed to bring the SAP HANA instance up on alternative compute.
If restoring service to compute 1 or compute 2 (systems attached to the synchronous replication volumes) then the volumes will already exist on the array and should be connected to the relevant host. If the SAP HANA instance is being failed over to a third site replicated to using asynchronous replication then the snapshots must first be restored to volumes and connected to the relevant host(s).
Step 1. Ensure the log and data volumes are not mounted on any of the source nodes.
Using the "df" command on each node:
df -h
Example of mounted SAP HANA data and log volumes; note the mount path in a Scale Out landscape using the Storage API Connector is in the form <basepath_datavolumes>/mnt0000#:
Filesystem 1K-blocks Used Available Use% Mounted on devtmpfs 264013600 0 264013600 0% /dev tmpfs 397605660 4 397605656 1% /dev/shm tmpfs 264021864 18840 264003024 1% /run tmpfs 264021864 0 264021864 0% /sys/fs/cgroup /dev/sdz2 62883840 14824852 48058988 24% / Fileserver.puredoes.local:/mnt/nfs/SHN_Backup 1073485824 160157696 913328128 15% /hana/backup Fileserver.puredoes.local:/mnt/nfs/SHN_Shared 1073485824 160157696 913328128 15% /hana/shared tmpfs 52804372 24 52804348 1% /run/user/469 tmpfs 52804372 0 52804372 0% /run/user/468 tmpfs 52804372 0 52804372 0% /run/user/1001 tmpfs 52804372 0 52804372 0% /run/user/0 /dev/mapper/3624a9370884890ea83bd488200011c47 536608768 5362804 531245964 1% /hana/data/SH1/mnt00001 /dev/mapper/3624a9370884890ea83bd488200011c4a 402456576 5762904 396693672 2% /hana/log/SH1/mnt00001
Step 2. (Optional) verify that the global.ini configuration file on the source and target match. More specifically ensure that the [storage] section with the volume WWID's is exactly the same. Note the global.ini file can be located at /hana/shared/<SID>/global/hdb/custom/config/global.ini.
Source System
# global.ini last modified 2020-07-09 02:42:08.450368 by /usr/sap/SH1/HDB00/exe/hdbnsutil -initTopology --hostnameResolution=global --workergroup=default --set_user_system_pw [communication] listeninterface = .global [multidb] mode = multidb database_isolation = low singletenant = yes [persistence] basepath_datavolumes = /hana/data/SH1 basepath_logvolumes = /hana/log/SH1 basepath_shared = yes use_mountpoints = yes [storage] ha_provider = hdb_ha.fcClient partition_*_*__prtype = 5 partition_1_data__wwid = 3624a9370884890ea83bd488200011c47 partition_1_log__wwid = 3624a9370884890ea83bd488200011c4a partition_2_data__wwid = 3624a9370884890ea83bd488200011c48 partition_2_log__wwid = 3624a9370884890ea83bd488200011c4b partition_3_data__wwid = 3624a9370884890ea83bd488200011c49 partition_3_log__wwid = 3624a9370884890ea83bd488200011c4c [trace] ha_fcclient = info
Target System
# global.ini last modified 2020-07-09 03:09:58.950952 by /usr/sap/SH1/HDB00/exe/hdbnsutil -initTopology --hostnameResolution=global --workergroup=default --set_user_system_pw [communication] listeninterface = .global [multidb] mode = multidb database_isolation = low singletenant = yes [persistence] basepath_datavolumes = /hana/data/SH1 basepath_logvolumes = /hana/log/SH1 basepath_shared = yes use_mountpoints = yes [storage] ha_provider = hdb_ha.fcClient partition_*_*__prtype = 5 partition_1_data__wwid = 3624a9370884890ea83bd488200011c47 partition_1_log__wwid = 3624a9370884890ea83bd488200011c4a partition_2_data__wwid = 3624a9370884890ea83bd488200011c48 partition_2_log__wwid = 3624a9370884890ea83bd488200011c4b partition_3_data__wwid = 3624a9370884890ea83bd488200011c49 partition_3_log__wwid = 3624a9370884890ea83bd488200011c4c [trace] ha_fcclient = info
Step 3. As the <sid>adm user use the hdbnsutil to change the system name.
Note failover in this example is done as follows:
SHN1 , Worker Node -> SHN5, Worker Node SHN2, Worker Node -> SHN6, Worker Node SHN3, Worker Node -> SHN7, Worker Node SHN4, Standby Node -> SHN8, Standby Node
hdbnsutil -convertTopology
A successful topology conversion will output the following:
nameserver shn5:30001 not responding. checking 1 master lock file(s) ....................................... ok load(/usr/sap/SH1/HDB00/exe/python_support/hdb_ha/fcClient.py)=1 attached device '/dev/mapper/3624a9370884890ea83bd488200011c47' to path '/hana/data/SH1/mnt00001' attached device '/dev/mapper/3624a9370884890ea83bd488200011c4a' to path '/hana/log/SH1/mnt00001' Opening persistence ... sh1adm: no process found hdbrsutil: no process found run as transaction master converting topology from cloned instance... - keeping instance 00 - changing host shn1 to shn5 - changing host shn2 to shn6 - changing host shn3 to shn7 - changing host shn4 to shn8 - keeping DatabaseName SH1 detached device '/dev/mapper/3624a9370884890ea83bd488200011c47' from path '/hana/data/SH1/mnt00001' detached device '/dev/mapper/3624a9370884890ea83bd488200011c4a' from path '/hana/log/SH1/mnt00001' done.
Step 4. Start the SAP HANA system.
Use the sapcontrol utility to start the SAP HANA system.
/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function StartSystem HDB
A successful start request will respond as follows:
09.07.2020 08:46:57 Start OK
To check on the status of the startup process for SAP HANA use the sapcontrol command with the GetProcessList function.
/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function GetSystemInstanceList
Once the instances on each node have been started they should be displayed as "GREEN".
13.07.2020 04:33:51 GetSystemInstanceList OK hostname, instanceNr, httpPort, httpsPort, startPriority, features, dispstatus shn7, 0, 50013, 50014, 0.3, HDB|HDB_WORKER, GREEN shn8, 0, 50013, 50014, 0.3, HDB|HDB_STANDBY, GREEN shn5, 0, 50013, 50014, 0.3, HDB|HDB_WORKER, GREEN shn6, 0, 50013, 50014, 0.3, HDB|HDB_WORKER, GREEN
Automated Failover for ActiveCluster
It is possible to use the SAP HANA Storage API connect in conjunction with an ActiveCluster implementation with preferred paths set for each host. In the event of a single site failure, the Storage APi connector will detach the volumes from compute 1, attach them to compute 2, and then start the instance.
This approach is best considered for Scale Up implementations where the topology is a single worker and single standby node.
1. Install the SAP HANA system, but during installation ensure that an additional system is added with the standby role. The volumes should already be in a POD, stretched to the additional array, and connected to the host group containing both compute instances on both arrays. Each host should also have a preferred path set (if required), this needs to be the same preferred array per host on each array.
In the event of storage failure
The failover will be transparent. Both nodes (worker and standby) will have access to the storage.
Storage failure will show paths failing in one priority group, in this instance the preferred array has remained available.
3624a9370884890ea83bd488200012863 dm-0 PURE,FlashArray size=512G features='0' hwhandler='1 alua' wp=rw |-+- policy='queue-length 0' prio=50 status=active | |- 0:0:2:253 sdi 8:128 active ready running | |- 0:0:6:253 sdq 65:0 active ready running | |- 0:0:3:253 sdk 8:160 active ready running | |- 0:0:5:253 sdo 8:224 active ready running | |- 1:0:3:253 sdw 65:96 active ready running | |- 1:0:2:253 sdu 65:64 active ready running | |- 1:0:7:253 sdae 65:224 active ready running | `- 1:0:8:253 sdag 66:0 active ready running `-+- policy='queue-length 0' prio=0 status=enabled |- 0:0:1:253 sdg 8:96 failed faulty running |- 0:0:0:253 sde 8:64 failed faulty running |- 0:0:4:253 sdm 8:192 failed faulty running |- 0:0:7:253 sds 65:32 failed faulty running |- 1:0:4:253 sdy 65:128 failed faulty running |- 1:0:5:253 sdaa 65:160 failed faulty running |- 1:0:6:253 sdac 65:192 failed faulty running `- 1:0:9:253 sdai 66:32 failed faulty running 3624a9370884890ea83bd488200012862 dm-1 PURE,FlashArray size=1.0T features='0' hwhandler='1 alua' wp=rw |-+- policy='queue-length 0' prio=50 status=active | |- 0:0:2:254 sdj 8:144 active ready running | |- 0:0:5:254 sdp 8:240 active ready running | |- 0:0:3:254 sdl 8:176 active ready running | |- 0:0:6:254 sdr 65:16 active ready running | |- 1:0:7:254 sdaf 65:240 active ready running | |- 1:0:3:254 sdx 65:112 active ready running | |- 1:0:2:254 sdv 65:80 active ready running | `- 1:0:8:254 sdah 66:16 active ready running `-+- policy='queue-length 0' prio=0 status=enabled |- 0:0:4:254 sdn 8:208 failed faulty running |- 0:0:1:254 sdh 8:112 failed faulty running |- 0:0:0:254 sdf 8:80 failed faulty running |- 0:0:7:254 sdt 65:48 failed faulty running |- 1:0:4:254 sdz 65:144 failed faulty running |- 1:0:6:254 sdad 65:208 failed faulty running |- 1:0:5:254 sdab 65:176 failed faulty running `- 1:0:9:254 sdaj 66:48 failed faulty running
In the event of host failure
The SAP HANA storage API connector will disconnect the volumes from the failed host and connect them to the standby, starting the instance. Any applications communicating with the instance will need to be aware that the instance has failed over.
Using the GetProcessList function of the sapcontrol utility, it can be seen that the indexserver and hdbxseengine have failover over and are now starting on the standby node.
13.07.2020 05:17:39 GetProcessList OK name, description, dispstatus, textstatus, starttime, elapsedtime, pid hdbdaemon, HDB Daemon, YELLOW, Initializing, 2020 07 13 05:01:49, 0:15:50, 6803 hdbcompileserver, HDB Compileserver, GREEN, Running, 2020 07 13 05:01:53, 0:15:46, 6863 hdbnameserver, HDB Nameserver, GREEN, Running, 2020 07 13 05:01:49, 0:15:50, 6821 hdbpreprocessor, HDB Preprocessor, GREEN, Running, 2020 07 13 05:01:53, 0:15:46, 6866 hdbwebdispatcher, HDB Web Dispatcher, GREEN, Running, 2020 07 13 05:01:54, 0:15:45, 6916 hdbindexserver, HDB Indexserver-SH1, YELLOW, Initializing, 2020 07 13 05:17:32, 0:00:07, 17024 hdbxsengine, HDB XSEngine-SH1, YELLOW, Initializing, 2020 07 13 05:17:32, 0:00:07, 17027
Manual System Failover for ActiveDR
The SAP HANA environment is configured as follows:
|
![]() |
For both Scale Up and Scale Out the initial configuration is the same as the configuration for ActiveCluster with the following exceptions :
- A different SID can be used on the source and target systems.
- There are no preffered paths and the world wide names of the volumes will be different on Array 1 and Array 2. For Scale Out Scenarios the Storage section of the global.ini file will need to be updated with the correct volume names.
- To bring the SAP HANA instance up on the target array for failover or migration the following steps must be followed :
1. Stop the Source SAP HANA System
/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function Stop
Wait until the instances have stopped on each system by using the following command:
/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function GetProcessList
2. Unmount the log and data volumes
For a Scale Out environment each node should be checked to ensure no volumes are mounted.
umount <path to volume>
3. Demote the Source array POD
Do not demote the POD until all volumes in the POD have been unmounted from the operating system
On the source array execute the following :
purepod demote --quiesce <POD NAME>
Wait until the POD is listed as "demoted" on the source array before continuing.
4. Promote the Target array POD
On the target array execute the following :
purepod promote <POD NAME>
5. Connect the log and data volumes to the target system(s) , mount them to the relevant SAP HANA volume locations (Scale Up) or update the global.ini file as set out in the Scale Out Failover Process , Step 2 (Scale Out) and then start the instance.