S.M.A.R.T. Alerts in vmkernel Log with FlashArray™ Hardware-backed Volumes
When investigating an issue in an ESXi host's logs, there are a significant amount of messages similar to this for SCSI in vmkernel.log:
Cmd(0x45d96d9e6f48) 0x85, CmdSN 0x6 from world 2099867 to dev "naa.624a9370f439f7c5a4ab425000024d83" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0
Or this for NVMe-oF, also in vmkernel.log:
WARNING: NvmeScsi: 172: SCSI opcode 0x85 (0x45d9757eeb48) on path vmhba67:C0:T1:L258692 to namespace eui.00f439f7c5a4ab4224a937500003f285 failed with NVMe error status: 0x1 translating to SCSI error ScsiDeviceIO: 4131: Cmd(0x45d9757eeb48) 0x85, CmdSN 0xc from world 2099855 to dev "eui.00f439f7c5a4ab4224a937500003f285" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0
ESXi regularly checks the S.M.A.R.T. status of attached storage devices, including for array-backed devices that aren't local. When the SCSI command is received on the FlashArray software, it returns 0x85 with the following sense data back to the ESXi host:
failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0
virten.net has a powerful tool for decoding these codes. When pasting this output into that site, the following details are displayed:
|Host Status||[0x0]||OK||This status is returned when there is no error on the host side. This is when you will see if there is a status for a Device or Plugin. It is also when you will see Valid sense data instead of Possible sense Data.|
|Device Status||[0x2]||CHECK_CONDITION||This status is returned when a command fails for a specific reason. When a CHECK CONDITION is received, the ESX storage stack will send out a SCSI command 0x3 (REQUEST SENSE) in order to get the SCSI sense data (Sense Key, Additional Sense Code, ASC Qualifier, and other bits). The sense data is listed after Valid sense data in the order of Sense Key, Additional Sense Code, and ASC Qualifier.|
|Plugin Status||[0x0]||GOOD||No error. (ESXi 5.x / 6.x only)|
|Sense Key||[0x5]||ILLEGAL REQUEST|
|Additional Sense Data||20/00||INVALID COMMAND OPERATION CODE|
What is key here is the Sense Key which has a value of ILLEGAL REQUEST. The FlashArray software does not support S.M.A.R.T. SCSI requests from hosts, so the FlashArray software returns ILLEGAL REQUEST to the ESXi host to tell the host it doesn't support that request type.
This is for two reasons:
- Since the FlashArray software's volumes are not a physically attached storage device on the ESXi host, S.M.A.R.T. from the ESXi host isn't necessary.
- The FlashArray software handles drive failures and drive health independent of ESXi and monitoring the health of these drives that back the volumes is handled by the FlashArray software, not ESXi. You can read more about this in this datasheet.
Pure has been working with VMware to reduce the noise and unnecessary concern caused by these errors. In vSphere 7.0U3c, VMware fixed this problem and this will now only log once this when the ESXi host boots up instead of as often as every 15 minutes.