Skip to main content
Pure Technical Services

Occasional Drive Failure Alarms | Pure CBS on Azure

Currently viewing public documentation. Please login to access the full scope of documentation.

 

Symptoms

CBS array might trigger alarms for one of the following health alerts:

Alert code Severity Description
60 Critical Drive failure detected
4 Critical Unexpected controller failover


Example of alert:


Alerts may be accompanied with introducing front-end latency for ~1-2 minutes. Meanwhile, data access from the drives is uninterrupted.

 

Applies to 

All versions of Cloud Block Store array instance running in Azure.

This issue does not affect any Cloud Block Store instances running in AWS.
 

Cause

The investigation of these performance degradation and health alerts points out to Persistence Reservation timeout. The Persistent Reservation (PR) commands are a group of SCSI control commands, that CBS uses to request exclusive write access to Azure Managed Disks. This ensures that no other initiator can write to a disk, preventing data corruption.

In some rare cases during infrastructure update, the process of creating a reservation on an Azure Managed Disks may take longer than usual wait times. As a result, controllers must wait until the reservation times out. Hence, drive failure detection and controller failover events.

Resolution

Even though, those alerts are slightly impacting latency and forcing controller to failover. It is not affecting data access or data integrity. Our engineering team is actively working to address and mitigate the Persistent Reservation's timeout issue in future releases.

 

If this limits your workload or if you have any questions, please don’t hesitate to reach out to email support@purestorage.com for assistance.