Occasional Drive Failure Alarms | Pure CBS on Azure
Symptoms
CBS array might trigger alarms for one of the following health alerts:
Alert code | Severity | Description |
---|---|---|
60 | Critical | Drive failure detected |
4 | Critical | Unexpected controller failover |
Example of alert:
Alerts may be accompanied with introducing front-end latency for ~1-2 minutes. Meanwhile, data access from the drives is uninterrupted.
Applies to
All versions of Cloud Block Store array instance running in Azure.
This issue does not affect any Cloud Block Store instances running in AWS.
Cause
The investigation of these performance degradation and health alerts points out to Persistence Reservation timeout. The Persistent Reservation (PR) commands are a group of SCSI control commands, that CBS uses to request exclusive write access to Azure Managed Disks. This ensures that no other initiator can write to a disk, preventing data corruption.
In some rare cases during infrastructure update, the process of creating a reservation on an Azure Managed Disks may take longer than usual wait times. As a result, controllers must wait until the reservation times out. Hence, drive failure detection and controller failover events.
Resolution
Even though, those alerts are slightly impacting latency and forcing controller to failover. It is not affecting data access or data integrity. Our engineering team is actively working to address and mitigate the Persistent Reservation's timeout issue in future releases.
If this limits your workload or if you have any questions, please don’t hesitate to reach out to email support@purestorage.com for assistance.