Skip to main content
Version: 4.0.0

Failover Planning with Superna

Introduction

Failover planning is essential for maintaining business operations during unexpected disruptions. This guide provides clear steps to prepare for, test, and execute failovers effectively, ensuring minimal downtime and secure data handling. It focuses on practical strategies to protect critical systems and maintain operational continuity.

Checklist to Plan for Failover

Use this checklist to ensure that your failover process is organized, documented, and ready when needed.

Steps Before Failover Day

  1. Document A DR Runbook Plan

    • Include a clear sequence of steps
    • List key contacts
    • Define the order of tasks and which contacts to involve
  2. Submit Support Logs For Failover Readiness Audit (7 days before the planned event)

    • When submitting a case, select Failover Case Type Planned
    tip
  3. Complete Failover Training Labs

  4. Review DR Design Best Practices And Failover Release Notes

  5. Upgrade Eyeglass To The Latest Version

    • Each Eyeglass release includes failover rules engine updates to improve reliability and avoid known failover issues.
    tip

    Review the Eyeglass PowerScale Edition Upgrade Guide for instructions.

  6. Test DR Procedures

  7. Benchmark Failover (Access Zone)

    • Prepare a test policy or runbook robot access zone for multi-policy testing
    • Perform test failovers with one, then two, then three policies
    • Record make writable times and calculate the average to estimate production failover times
    note
    • Fully configure the test access zone (SPNs, shares, exports, quotas) for accurate estimates.
    • If change rate is zero, skip creating changed data before failover.
    • Create as many shares under each policy as in production to benchmark rename step times.
    warning

    Failover logs include steps after failover. The failover job time does not represent actual failover time. Calculate the make writable step from the logs.

  8. Benchmark Failover (DFS Mode)

    • Use an Access Zone with a DFS mode policy or create a test DFS mode policy
    • Copy test data into the path
    • Create shares as in production
    • Test failover with one, then two, then three policies
    • Calculate the average make writable time to estimate production failover times
    note

    Creating as many shares as in production helps benchmark the rename step, which runs in parallel.

    warning

    Failover logs include steps after failover. Calculate the make writable step from the logs.

  9. Create A Contact List For Failover Day

    • Active Directory administrator
    • DNS administrator
    • Cluster storage administrator
    • Workstation, server administrators
    • Application team for dependent applications
    • Change management team for the outage window
  10. Reduce Failover And Failback Time With Domain Mark

    • Run manual domain mark jobs on all SyncIQ policy paths
    • This speeds up failover by reducing domain mark time
    • Domain Mark Details
  11. Count Shares, Exports, NFS Alias, Quotas On Source And Target

    • Validate that configuration items are synced correctly
    • Ensure no quotas are synced on the target (only shares, exports, NFS alias)
    • Use OneFS UI and Superna Eyeglass DR Dashboard to verify
  12. Verify Dual Delegation In DNS Before Failover

    • Confirm DNS is pre-configured for failover across all SmartConnect Zones in the Access Zone
  13. Prepare For DFS Failover

    • Use dfsutil diag viewdfspath and dfsutil cache referral to verify two active paths and correct active path
    • Confirm DFS mounts have both referrals
    • Download dfsutil tool by OS type from Microsoft Docs Archive
  14. Communicate With Application Teams And Business Units

    • Schedule a maintenance window
    • Inform them that data loss will occur if data is written after the maintenance window start time
  15. Adjust SyncIQ Policy Schedules One Day Before Failover

    • Set all policies to run every 15 minutes or less
    • Avoid enabling run on change
    • This keeps data in sync and avoids long-running jobs that extend failover
    warning

    Reviewing DR design best practices, failover release notes, running the latest Eyeglass version, testing DR procedures, and benchmarking failover (both Access Zone and DFS Mode) is mandatory. DR assistance requires acceptance before you continue.

Steps On The Failover Day

  1. Pause SMB And NFS IO Before Failover Starts

    • For SMB (2.0 or later), use the failover option to block IO on shares.
    • This option applies a deny read permission before failover and removes it after failover completes.
    • For NFS, disable the protocol or unmount exports to ensure no IO during failover.
  2. Force Run All SyncIQ Policies One Hour Before Planned Failover

    • Run each SyncIQ policy so that the failover run has less data to sync.
  3. Execute Failover

  4. Monitor Failover

  5. If Required, Refer To Data Recovery Guide

  6. Ensure Active Directory Administrator Is Available

    • ADSIedit recovery steps may be required.
    • Make sure the Active Directory administrator can access cluster machine accounts.

After Failover: Test Data Access

Failover Readiness

Eyeglass-assisted failover uses diagnostics to check if failover is possible or recommended. It updates a DR Dashboard to show your current state.

The DR Dashboard identifies when the following need attention:

  • Data sync issues,
  • Configuration sync issues,
  • SPN out-of-sync conditions, and
  • Invalid IP pool mapping for IP pool or Access Zone failover.

The DR Dashboard also provides a SyncIQ readiness and DFS mode policy view. This helps you assess sub-Access Zone failover readiness versus the entire Access Zone. Eyeglass validates your DR readiness at set intervals. It notifies you through external alarms if problems occur.

The Eyeglass Runbook Robot feature offers another way to validate your readiness. It automates a failover on a non-production “EyeglassRunbookRobot” Access Zone or SyncIQ policy every night at midnight. It uses the same failover steps your environment would use. It notifies you through external alarms if problems occur.

This feature acts as a cluster witness and mounts the cluster over NFS. It writes and re-reads test data to confirm failover from the client’s perspective.

  • You can use it in basic or advanced modes.

  • The basic mode uses a SyncIQ policy for failover and does not include extra logic. It is easy to set up and provides quick failover and failback tests.

  • The advanced mode tests all logic and uses Access Zone failover. It uses the same NFS write and re-read method and also manages SPN, SmartConnect Zone mapping, and failover logic.

    tip

    For more details, see Runbook Robot Guide

Quota Failover Options

  • Skip Quotas During Short Failovers
    During failover, use the skip failover option by unchecking the quota checkbox. This skips the quota step, leaving quotas on the source cluster. Use this option for short failovers, such as over a weekend, to avoid interference from quota scans and SyncIQ. Quotas are not required for failover testing and are safer to leave on the source cluster.

    note

    On failback, make sure to uncheck the quota failover option.

  • Enable Pre-Synchronization of Quotas
    In version 2.5.3 or later, you can use the CLI guide to enable the quota inventory job, which collects quotas on a new job using a default schedule of twice per day. Pre-synchronizing quotas is also possible with a quota sync schedule job. This ensures quotas are not failed over and remain pre-staged on the target DR cluster.

    note

    Use the skip quota option in DR Assistant if pre-syncing quotas.

Access Zone Failover

Eyeglass uses the Access Zone as the unit for failover when you do not use DFS mode or per SyncIQ. This approach simplifies DR readiness planning and failover operations. You can fail over shares, exports, and quotas in this mode.

Access Zone failover also includes networking failover of SmartConnect Zones and any aliases that exist. Eyeglass must fail over all IP pools in the Access Zone, along with all aliases, all SyncIQ policies, and all shares, exports, and quotas at the same time. This prevents SPN collisions in Active Directory and blocks clients from mounting the source cluster after failover.

You must plan and map IP pools from the source to the target clusters before the Access Zone is ready for failover.

SMB authentication depends on the Active Directory machine account to have the correct SPN values for SmartConnect Zones. Failover and authentication depend on SPNs registering with a writable cluster. Eyeglass Access Zone failover automates SPN management. It also creates SmartConnect Zone aliases to access data with a DNS update that delegates the SmartConnect Zone to the PowerScale cluster.

info

DFS mode does not require DNS, SPN, or SmartConnect Zone changes during failover.

The following scenarios show cluster configuration before Access Zone failover. During normal operation, both primary and secondary clusters remain available. You prepare for failover by creating mapping hints.

info

Use Access Zone failover when you need a single unit to manage all data, networking, and authentication changes at once. This method provides a clear and controlled process for DR readiness and failover operations.

Eyeglass DR Assistant - Access Zone Failover

  1. Ensure no live access to data, or enable the Data Integrity failover option to disable access to SMB Shares before the failover.


  2. Begin the failover (Eyeglass automated).


  3. Perform validation (Eyeglass automated).


  4. Set configuration replication for policies to USERDISABLED (Eyeglass automated).


  5. Provide write access to data on the target (Eyeglass automated).


  6. Move the SmartConnect Zone to the target (Eyeglass automated).


  7. Update SPN to enable authentication against the target (Eyeglass automated).


  8. Repoint DNS to the target cluster's IP address using the post-failover script (Eyeglass automated with scripting).


  9. Refresh the session to apply DNS changes using the post-failover script (Eyeglass automated with scripting).

tip

For more details, consult the Access Zone Failover Guide.

IP Pool Failover

Eyeglass supports IP pools as a failover unit within an Access Zone. Selecting the IP pool as the failover unit simplifies DR readiness calculations and failover operations. With this mode, shares, exports, and quotas are included in failover.

IP pool failover manages networking failover for SmartConnect Zones and any existing SmartConnect Zone aliases. Eyeglass fails over all policies mapped to the pool using the IP pool policy mapping UI in the DR Dashboard. All SmartConnect names and aliases configured on the pool, along with mapped SyncIQ policies, shares, exports, and quotas, fail over together. During failover, Eyeglass renames (rather than deletes) source cluster zone names to avoid SPN collisions in Active Directory and to prevent clients from mounting the source cluster.

info
  • Planning and mapping IP pools from source to target clusters are required before marking the pools as ready for failover.

  • Converting an Access Zone to IP pool failover is necessary, meaning all pools in an Access Zone must have a policy mapped to a pool before any pool in the zone can fail over.

SMB authentication relies on the AD machine account having correct SPN values for SmartConnect Zones. SPN registration must occur with a writable cluster for authentication and failover. Eyeglass automates SPN management and creates SmartConnect Zone aliases, enabling access with a DNS update that delegates the SmartConnect Zone to the PowerScale cluster.

note

DFS mode does not require DNS, SPN, or SmartConnect Zone changes during failover. You can fail over DFS IP pools using the Pool Failover feature.

Eyeglass DR Assistant - IP Pool Failover Summary

  1. Verify that no live access exists to data, or enable the Data Integrity failover option to disable access to SMB shares before failover.
  2. Start the failover process (automated by Eyeglass).
  3. Validate the failover (automated by Eyeglass).
  4. Set configuration replication policies to USERDISABLED (automated by Eyeglass).
  5. Enable write access to data on the target (automated by Eyeglass).
  6. Move the SmartConnect Zone to the target (automated by Eyeglass).
  7. Update the SPN to enable authentication on the target (automated by Eyeglass).
  8. Repoint DNS to the target cluster's IP address using a post-failover script (automated by Eyeglass with scripting).
  9. Refresh the session to apply the DNS changes using a post-failover script (automated by Eyeglass with scripting).
tip

For more information on this failover mode, refer to the Failover Configuration, and review the IP Pool Failover section.

SyncIQ DFS Mode with Eyeglass

This mode supports easy failover and failback operations with quota failover and failback (excluding exports). It allows you to mount the writable copy of SyncIQ data without manual steps, DNS updates, remounts, or re-authentications.

This mode uses DFS folder UNC targets with the same share name, and a SmartConnect zone for each cluster configured to use both clusters. Eyeglass ensures that shares exist on only one cluster at a time and moves them during failover events. When Eyeglass creates the shares, DFS activates the target folder path to the secondary cluster.

note

You can use two different SmartConnect zones on the source and destination clusters. This setup requires no changes to either cluster during failover. The figure below shows a typical DFS folder setup.

dfs_mode_failover_dr_assistant

DFS Mode Failover - Eyeglass DR Assistant

  1. Ensure no live access to data, or enable the Data Integrity failover option to disable access to SMB shares before failover.
  2. Begin failover (Eyeglass automated).
  3. Perform validation (Eyeglass automated).
  4. Set configuration replication for policies to USERDISABLED (Eyeglass automated).
  5. Provide write access to data on the target (Eyeglass automated).
  6. Move SmartConnect zone to the target is not required (Eyeglass automated).
  7. Update SPN for authentication on the target is not required (Eyeglass automated).
  8. Repoint DNS to the target cluster IP address is not required (performed by a post-failover script, (Eyeglass automated with scripting)).
  9. Fail over shares and quotas, Eyeglass creates them on the target and deletes them from the source (Eyeglass automated).
  10. DFS clients switch to the DR cluster when the second DFS folder UNC target path is available.
tip

You find details on this failover mode in the Microsoft DFS Mode Failover Guide.

SyncIQ Mode with Eyeglass

Use this mode when you need targeted failover. It lets you fail over specific policies without failing over an entire Access Zone. It does not manage SPNs, so it works best with NFS exports and quotas. Shares and exports are already synced with Eyeglass, so both resources are ready during failover.

Key Differences

  • Unlike Access Zone failover, this mode does not automate SmartConnect Zone failover.
  • You must fail over selective SmartConnect Zones by using SmartConnect Zone aliases, then updating DNS.

Post-Failover Scripts

This mode supports a post-failover script engine. It lets you run scripts on hosts to unmount and remount file systems after failover.

  • Sample scripts are available.
  • Superna Professional Services can create custom host-side scripts.
  • Review the Script Engine Overview section in the Eyeglass Administration Guide.

DNS And Automation Considerations

You can run these scripts without updating DNS. The target cluster’s SmartConnect Zone can mount directly after the SyncIQ policy is set to writable on the target cluster.

Use this mode if you have fewer than 30 hosts. If you have more than 30 hosts, consider Access Zone failover and DNS updates.

Additional Details

Diagrams show the failover flow and sample commands used during Eyeglass policy failover. SPN commands appear if SMB manual failover occurs. For more information on this failover mode, consult the SyncIQ Policy Failover Guide.

DR Rehearsal Mode

When you use DR Rehearsal Mode, you pause the normal data replication process (SyncIQ). This pause lets you write new data to the target cluster for testing without disrupting your production cluster, which keeps running with its own SmartConnect name. You only have one copy of the data during rehearsal. When you disable DR Rehearsal Mode, the target cluster discards any test changes and then pulls fresh data from your production cluster. Use a different DNS name when you mount data to avoid confusion.

Pros

BenefitDescription
Faster Failover and TestingYou can test your DR procedures without delay.
Production Continues RunningYour production workloads remain available.
AD and Network Cloning Is PossibleYou can mirror production settings.

Cons

alt text

IssueDescription
Data SynchronizationYour test changes do not sync after testing.

FAQs

Do I Need to Remount Shares With Access Zone Failover or Pool Failover?

This is a common question the data integrity failover option helps improve and reduce the impact on the client machine with Access Zone and IP pool failover. The data integrity failover feature disconnects the netbios session for all shares involved in a failover. This also removes the cached IP session to the source cluster. After the DNS redirect step is completed, Windows machines can mount the DR cluster correctly, with some exceptions identified below.

Windows machines can re-establish the netbios session and query DNS to get an IP from the DR cluster. This avoids a remount requirement on the Windows machine.

Limitations

  • A machine with an open file will continue to cache the source cluster netbios session.

  • Machines with no active open file can switch clusters without a remount requirement.

  • A user actively using explorer on the share that was failed over will still cache the netbios session of the source cluster and will require a remount of the share.

How To Determine Best Approach for Quota for Failover?

Quota has some challenges for failover with OneFS 8.x. The quota scan job runs as soon as new quotas are created. The quota scan job sets a flag on newly created quotas to indicate when the quota domain has been created. SyncIQ operations that conflict with quotas are marked with a flag indicating the quota domain has not been created yet. This can fail SyncIQ operations for making writable or resync prep steps in a failover.