Replacing a Boot Drive
This KB serves as a supplemental guide for the Boot Drive Replacement Guides. There are certain points within the guide that provide the guidance to call support for action to be taken, this KB will include guidance on how to perform that steps and any caveats to the procedure. Please check back to this KB every time you plan to perform this procedure as it is updated with any new caveats or steps that are needed.
IMPORTANT: Prior to the Swap
-
Perform a Health Check: If there are any issues or open alerts, resolve them with support before proceeding.
-
Make a note of tunables set on the array, these will need to be set again after the swap is complete:
From the array:puretune --list
From logs on fuse:
pureadm list-tunable
- Check if the current purity installation files are still in the /var/cache/purity directory. If the files are not there, please request that Pure Storage stages the proper Upgrade files before the boot drive swap. This is necessary if you have to upgrade Purity on the newly replaced boot drive.
Configure Purity on New Boot Drive
Purity will not start on boot for replacement boot drives. This is to prevent a version mismatch while Purity is running.
Step 1: Verify Purity version on good controller (Example: CT0)
Check purity version of good controller and confirm that secondary is still not present:
root@slc-420-ct0:/home/os76# purearray list --controller Name Mode Model Version Status CT0 primary FA-4XX 4.5.3 ready CT1 not present - - unknown
Step 2: From the existing controller (Example: CT0) find if connection to new peer (bond0/haeth0) is available:
os76@slc-420-ct0:~$ ip neighbor |grep bond0 fe80::202:c903:a2:d261 dev bond0 lladdr 80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:a2:d2:61 DELAY
ip neighbor |grep haeth0
Step 3: Connect to peer (Example CT1) via bond0:
os76@slc-420-ct0:~$ ssh os76@fe80::202:c903:a2:d261%bond0 The authenticity of host 'fe80::202:c903:a2:d261%bond0 (fe80::202:c903:a2:d261%bond0)' can't be established. ECDSA key fingerprint is 79:34:86:07:fd:19:96:dc:1f:e6:ad:04:88:6c:0e:ed. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'fe80::202:c903:a2:d261%bond0' (ECDSA) to the list of known hosts. os76@fe80::202:c903:a2:d261%bond0's password: The programs included with the Ubuntu system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. Fri Oct 02 09:26:50 2015 Welcome os76. This is Purity Version 4.5.2 on FlashArray pure http://www.purestorage.com/
Step 4: Check version of Purity on the replaced controller (Example CT1)
os76@pure-FAKOEm9N:~$ pureversion Product Version: 4.5.2
Depending on the version:
- If Purity is the same, no further action required, proceed to starting Purity.
- If the Purity version is not the same on the replaced boot drive (Example CT1) then you will need to install Purity before proceeding.
Step 5: Move files from good controller (Example CT0) to Replaced Controller (Example CT1)
There is a chance that the files you need are on the good controller. If they are not, however, you will need to SCP those files to the good controller via a Remote Assist session. Contact Support for the necessary files.
For this process you do not want to use the upgrade script, so make sure you have the .ppkg AND .sha1 files.
Once you have the upgrade files needed, you can scp
the purity files matching the existing controller (in this example, 4.5.3) to the peer via bond0:
root@slc-420-ct0:/home/os76# scp purity_4.5.3_201508120114+f96731f.ppkg* os76@\[fe80::202:c903:a2:d261%bond0\]:/home/os76/ os76@fe80::202:c903:a2:d261%bond0's password: purity_4.5.3_201508120114+f96731f.ppkg 100% 1212MB 110.2MB/s 00:11 purity_4.5.3_201508120114+f96731f.ppkg.sha1
Step 6: As root on the replacement controller (CT1), upgrade purity to match versions:
root@pure-FAKOEm9N:/home/os76# pureinstall purity_4.5.3_201508120114+f96731f.ppkg Verifying package... Installing Purity on alternate partition labeled second. Erasing Purity software image from alternate partition second to prepare for installation. WARNING: Do not interrupt this process!! Unpacking new Purity software. ............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ Finalizing installation. This may take several minutes. Purity installed. Installation complete. The new Purity version will load at next reboot. Important! The first boot of a new Purity version may take longer if the new version includes controller firmware updates. DO NOT REBOOT THE CONTROLLER DURING THE FIRMWARE UPDATE. Refer to http://community.purestorage.com for more information about the Purity upgrade process and firmware updates.
NOTE: Check the timezone from both controllers using cat /etc/timezone
. If both controllers have the same timezone then you will not need to set it, skip ahead to step 8.
Step 7: Set timezone if needed:
root@pure-FAKOEm9N:~# puresetup timezone ########################################## # Welcome to the Purity Setup Wizard # ########################################## [Errno 111] Connection refused Error: Unable to communicate due to exception. Please try again. Changing the time zone will immediately stop Purity and require a reboot on this controller. Current time zone: America/Los_Angeles Change time zone [requires reboot] (y/n): y lio-drv disabling wait-for-state foed stop/waiting wait-for-state gui stop/waiting wait-for-state lio-drv stop/waiting Pure Storage is offline. Current default time zone: 'America/Denver' Local time is now: Fri Oct 2 10:52:53 MDT 2015. Universal Time is now: Fri Oct 2 16:52:53 UTC 2015. Confirm time zone change from America/Los_Angeles to America/Denver (y/N): y Tunable parameter set: PURITY_START_ON_BOOT=1 Press ENTER to reboot This controller will be online after reboot. Broadcast message from pureeng@pure-FAKOEm9N (/dev/pts/3) at 10:53 ... The system is going down for reboot NOW! Broadcast message from pureeng@pure-FAKOEm9N (/dev/pts/3) at 10:53 ... The system is going down for reboot NOW!
NOTE: Since puresetup timezone
reboots the controller, skip ahead to step 9.
Step 8: If timezone did not need to be changed, reboot:
root@pure-FAKOEm9N:/home/os76# pureboot reboot --offline Broadcast message from pureeng@pure-FAKOEm9N (/dev/pts/3) at 9:47 ... The system is going down for reboot NOW! Broadcast message from pureeng@pure-FAKOEm9N (/dev/pts/3) at 9:47 ... The system is going down for reboot NOW!
Step 9: Watch for controllers to be online.
Here, CT1 is visible but not online yet:
root@slc-420-ct0:/home/os76# purearray list --controller Name Mode Model Version Status CT0 primary FA-4XX 4.5.3 ready CT1 secondary FA-4XX 4.5.3 not ready
After waiting a couple more minutes, see both online:
root@slc-420-ct0:/home/os76# purearray list --controller Name Mode Model Version Status CT0 primary FA-4XX 4.5.3 ready CT1 secondary FA-4XX 4.5.3 ready
Step 10: Set the tunables to the new controller to match previous configuration.
Step 11: Test ssh peer
Make sure that you can "ssh peer
" to the controller with the new boot drive, if you have any problems doing this please see KB: Unable to SSH to Peer after Controller Replacement.
Swapping Both Boot Drives
In some rare cases we may need to swap both boot drives. If we do, please keep the following in mind:
- We will want to replace the boot drive on the secondary first, this will need to be co-ordinated with the field technician if they are performing the swap.
- Ensure that the replaced boot drive is healthy, the proper Purity Version, and that the GUI has been synced before proceeding to replace the Primary.
- Once this has been confirmed, force a failover on the Primary and make sure that the failover completed without issue.
- Identify the new secondary controller and repeat the boot drive replacement procedure above.
Troubleshooting
If after swapping the boot drive and starting Purity, foed gets stuck at:
root@PURESTORAGE:~# pureadm start purity start/running platform: .done foed: done gui: ..........done rest: done platform_env: 0.done foed_env: 27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.27.^C ..and rdmaoopsd logs contains something similar to: Jan 20 19:40:36 rdmaoopsd[MSG]: RDMA CM event 3/RDMA_CM_EVENT_ROUTE_ERROR (id 0x1c7f920/context 0x1c7cb40) Jan 20 19:40:36 rdmaoopsd[ERR]: Route resolution error for remote fe80::f652:1403:87:9121%8 Jan 20 19:40:37 rdmaoopsd[MSG]: RDMA CM event 1/RDMA_CM_EVENT_ADDR_ERROR (id 0x1c7f920/context 0x1c7cb40)
Check the IB links to make sure they are showing full speed:
root@ct1:/var/log/purity# purehw list | awk '$1 ~ /IB/' CT0.IB0 ok - 4 0 56.00 Gb/s - CT0.IB1 ok - 4 1 56.00 Gb/s - CT1.IB0 ok - 4 0 56.00 Gb/s - CT1.IB1 ok - 4 1 56.00 Gb/s -
If those are fine, you should be able to resolve it by restarting rdmaoopsd on the controller with the error:
root@PURESTORAGE:/var/log/purity# service rdmaoopsd restart rdmaoopsd stop/waiting rdmaoopsd start/running, process 46126 Check the status of Purity and you should find it completing: root@PURESTORAGE:/var/log/purity# pureadm wait platform: done foed: done gui: done rest: done platform_env: 0.done foed_env: 2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.0.done remote_patch: done driver: done san: ......done health: done Broadcast Message from ct0 (somewhere) at 11:56 ... Purity Information System Status ================================ Purity has successfully started for the first time after an install or upgrade. Purity 3.4.3 (201405140754+d5af1e5-r6) is now set to be the default.