Version: 2.12.0

Controlled vs Uncontrolled Failover

Before starting any failover procedure, it's important to understand the distinction between controlled and uncontrolled failover, as well as the potential data protection implications associated with each. In Eyeglass DR Assistant, selecting the appropriate failover method directly affects data integrity, recovery efforts, and the overall success of the disaster recovery process.

Using controlled failover ensures that both clusters are synced and that no data is lost. This should always be the first option if the Eyeglass VM maintains connectivity to both clusters. On the other hand, uncontrolled failover comes with significant risks, such as immediate data loss and the need for manual intervention during recovery. This method should only be used in critical scenarios where connectivity to the source cluster is lost and there is no other option.

Understanding these implications is essential to avoid data corruption, ensure a smooth recovery, and minimize potential downtime. Therefore, before initiating any actions, you must carefully assess which failover option applies to your situation.

warning

Using uncontrolled failover means you are failing away from the data and will lose ALL changes at the moment the failover starts in Eyeglass.

This option should only be used when the Eyeglass VM does not have reachability to source cluster.

Even if there are data access issues with PowerScale OneFS, as long as Eyeglass shows green reachability on the Continuous Op Dashboard, do not use uncontrolled failover — instead, opt for controlled failover.

Recovery from uncontrolled failover is the customer's responsibility and is not covered by the support contract.

This will involve coordination with all vendors related to the equipment in the customer's data center, as well as receiving approval from all relevant parties (e.g., PowerScale OneFS, AD, DNS, other applications using PowerScale OneFS services, physical infrastructure such as power and networking WAN links) before resuming operations.

DO NOT bring the cluster online without planning. Resync preparation does not run automatically in this mode, meaning both clusters will be writable. You should disconnect the source cluster and carefully plan a controlled recovery from the uncontrolled failover.

Reasons you might need to execute an uncontrolled failover include the following:

A WAN link to the data center is severed, with a lengthy repair time expected to restore service.
Extended power loss at the production data center.
A damaged cluster or a significant issue during an upgrade.
Equipment failure preventing access to the cluster, or application server failures with prolonged recovery times.
A network failure that blocks users from accessing storage and also affects the PowerScale OneFS management network.