Troubleshooting FlashArray Storage Replication Adapter (SRA)
This article provides a guide to troubleshooting the Pure Storage FlashArray SRA.
Locating logs - Linux appliance
The SRA's logs are located in:
This directory should have the logs referenced below.
The SRM's logs are located in:
Locating logs - Windows appliance
The SRA's logs are located in:
%PROGRAMDATA%\VMware\VMware vCenter Site Recovery Manager\Logs\SRAs\purestorage
Each invocation of the SRA produces one log file. Sort by Date Modified to see the commands executed in chronological order.
The SRM's logs are located in:
%PROGRAMDATA%VMware\VMware vCenter Site Recovery Manager\Logs
Look for vmware-dr-##.log files. The file with the largest ## is the most recent. The SRM logs are useful for diagnosing problems when (a) The SRA responded correctly, but SRM still failed an operation, and (b) The SRA crashed on launch before being able to log anything. Before collecting SRM logs, be sure to quit SRM and wait a few seconds for all the logs to be flushed to disk.
The SRA installer's logs are located in:
To verify that the SRA has been installed correctly, do the following:
- Confirm that the SRA is listed in Programs and Features.
- Find where SRM is installed by looking at
HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc.\VMware vCenter Site Recovery Manager\InstallPathin the Registry. It is usually
C:\Program Files\VMware\VMware vCenter Site Recovery Manager\bin.
- In the SRM folder, navigate to the "storage/sra/purestorage" folder.
- Confirm that the following files are present (filename case may vary):
command.pl PureSRA.exe PureSRA.pdb PureSRA.exe.config PureStorage.Rest.dll PureStorage.Rest.pdb Newtonsoft.Json
- Right click on PureSRA.exe and select Properties. Verify that it's the version you expect, and that the binary is signed by Pure Storage.
- Confirm that .NET 3.5 is installed by checking the presence of:
The SRA log starts by logging the SRA version and the info about the environment it runs in. The SRA expects to run as an admin, and as a 64-bit process. It will not work correctly otherwise. Look for something like to confirm this is the case:
[02/12/2015 11:03:30,Logging session for discoverArrays,V] Process is 64-Bit. [02/12/2015 11:03:30,Logging session for discoverArrays,V] Running as the administrator.
Confirming Array Pair
The SRA logs the input from SRM. Look for "Received input:" followed by an XML string near the top of the log file. In the XML string, there are usually two Connection nodes, with the id "localArray" and and "peerArray". They are the pair of array the operation is being applied on. These info are entered by the user when configuring the SRA from inside SRM. Make sure they are actually the arrays you are using for SRM. Example:
<Connection id="localArray"> <Addresses> <Address id="spA">vss-purity-vm1.dev.purestorage.com</Address> </Addresses> ... </Connection>
<Connection id="peerArray"> <Addresses> <Address id="spA">vss-purity-vm2.dev.purestorage.com</Address> </Addresses> ... </Connection>
Each entry in the SRA log is associated with a verbosity level. Search for ",E]" and ",W]" in the log file to see the logged Errors andWarnings, respectively. They are usually indicative of what went wrong. For example, the entries below indicate that the SRA could not connect to an array:
[02/11/2015 09:40:56,SRMCommandHandlerBase.cs:ConnectToInputAr<wbr/>rays,W] Connection failed to FlashArray at 10.66.50.90 using connection localArray [02/11/2015 09:40:56,SRMCommandHandlerBase.cs:ConnectToInputAr<wbr/>rays,E] "PureRestException: HttpStatusCode = 'BadRequest', RestErrorCode = 'InvalidVersion', Details = '', InnerException = ''"
If the entire operation failed, the SRA will output an error. Look for "Setting output:" followed by some XML string. You should see an Error node with an error ID, such as:
<Error code="1004"> <purestorage:PureExceptionMessage>...</purestorage<wbr/>:PureExceptionMessage> <purestorage:LogFile>...</purestorage:LogFile> </Error>
The meaning of the error codes are listed below. For example, 1004 stands for "array unreachable".
WarningSyncInProgress = 500, // Defined by VMware ErrorUnhandledException = 1001, ErrorUnknownCommand = 1002, ErrorPureException = 1003, ErrorArrayUnreachable = 1004, ErrorArrayUnauthorized = 1005, ErrorArrayIdNotAvailable = 1006, ErrorArrayIDMissing = 1007, ErrorBadArrayPair = 1008, ErrorVolumeNotInPGroup = 1009, ErrorCannotFindSyncStatus = 1010, ErrorCannotFindSnapshot = 1011, ErrorCannotFindVolume = 1012, ErrorTestFailoverStartInProgress = 1013, ErrorVolumeConnectionFailed = 1014, WarningCannotFindPgroup = 1015, ErrorCannotCreatePgroup = 1016, WarningDeviceAlreadyFailedOver = 1017, WarningPrepareFailoverInProgress = 1018, ErrorArrayInsufficientPermissions = 1019, WarningHostConnectionFailed = 1020, ErrorCannotCreateVolume = 1021, ErrorCannotDisconnectVolume = 1022, ErrorCannotRenameVolume = 1023, ErrorVolumeNotDisconnected = 1024, ErrorCannotDeleteVolume = 1025, ErrorCannotSnapshotPGroup = 1026, ErrorEmptyOrMissingDeviceID = 1027, WarningAlreadyPerformedPrepareRestoreReplication = 1028, WarningAlreadyPerformedRestoreReplication = 1029, WarningAlreadyPerformedPrepareReverseReplication = 1030
The "PureExceptionMessage" portion of the error will contain more specific information about why the operation failed. Examples are under the subheadings for specific errors below.
Additionally, the SRA logs the HTTP requests (URL only) and their response codes. Look for "Rest Library transcript:" in the log. In the HTTP transcript that follows, look for "PureStorage.Rest Error:". Note that many HTTP errors are benign and expected (e.g. we test for the existence of a volume by asking the array about it; if the array responds with a does-not-exist error, we know it doesn't exist), so view errors in the HTTP transcript in the context of other clues in the log.
Error 1004 (ErrorArrayUnreachable)
[01/04/2018 11:54:01,DiscoverArrays.cs:ProcessCommand,V] Exiting Setting output: <?xml version="1.0" encoding="utf-8"?> <Response xmlns="http://www.vmware.com/srm/sra/v2" xmlns:purestorage="http://www.purestorage.com/sra"> <Error code="1004"> <purestorage:PureExceptionMessage>The remote server returned an error: (400) Bad Request. Message from Purity='ctx:,msg:invalid credentials'</purestorage:PureExceptionMessage> <purestorage:LogFile>C:\ProgramData\VMware\VMware vCenter Site Recovery Manager\Logs\SRAs\purestorage\discoverArrays_2018-01-04-11-53-58-443789-6607c4a5-ed27-4f4a-8d3b-46c8db303e85.log</purestorage:LogFile> </Error> </Response>
This usually means the array managers are configured incorrectly. If it is a "one to many" or "many to one" configuration, ensure that the Pure FlashArray username and password are the same for each array.
Tunable: HTTP Timeout
If the SRA does not hear a response from the array with 60 seconds for a REST call, it times out (indicated by verbiage about "timed out" in the logs).
To change the HTTP timeout value from the default value of 60 seconds, do the following (on all machines where the SRA is installed):
- Launch the Registry Editor.
- Create the registry key
HKEY_LOCAL_MACHINE\SOFTWARE\PureStorage\SRA(or navigate to it if it already exists).
- Create a DWORD Value named "HTTPTimeoutInSeconds", if it doesn't exist already.
- Change the value to the desired HTTP timeout value (e.g. 120 seconds, in decimal), and press OK.
The change will take effect the next time the SRA is invoked.
Tunable: Host Connection on SRM Failover
By default, the SRA prioritizes host group connections when asked by SRM to connect to hosts (e.g. if a hostgroup HG contains a host H, the SRA will connect to HG when asked to connect to H). Most users should use this behavior.
However, if the user wishes to disable this behavior (i.e. only connect to hosts on failover), the can add a DWORD Value (named "DisableHostGroupConnectionOnFailover") under the registry key
HKEY_LOCAL_MACHINE\SOFTWARE\PureStorage\SRA, and set its value to 1.
One useful tool of debugging problems is to use Fiddler. You will need to follow Decrypting HTTPS-protected traffic to set up debugging for HTTPS traffic. Repeat the failed SRM operation with Fiddler running to see the HTTP traffic. You can save Fiddler trace to file and give to the dev team for further debugging.
Another noteworthy reminder is that the user needs to rescan for the SRA after upgrading it (it's covered in the manual accompanying the SRA), or they may see errors.
Additionally, it might be useful to enable a higher level of logging if what you have is not providing sufficient information. Please note that greater detail in logging can fill up whatever the logs are writing to, so you need to be really careful when enabling this and should only enable it for short periods of time while the customer is reproducing an issue and then disable it immediately after. More details can be found in this VMware KB.