Skip to main content
Version: 4.0.0

Simulated Disaster Event Procedure

Introduction

This document outlines Eyeglass simulated disaster test scenario for clients who want to perform disaster testing during a scheduled maintenance window in the following cases:

  1. Controlled failover with production SyncIQ policies, and uncontrolled failover with a DFS mode Test SyncIQ policy, without impacting or exposing production data to data loss or resync risks.
  2. Controlled failover of production Access Zone, and uncontrolled failover with EyeglassRunbookRobot Access Zone, without impacting or exposing production data to data loss or resync risks
OPTIONAL

This can be used to perform simulated failover without any production data failed over within the same maintenance window.

RECOMMENDATION

Open a case with Superna Support to review this document before attempting the procedures within it.

Support Statement on the Use of this Procedure

  1. This procedure is the only supported process. Any variant of the process that uses uncontrolled failover on production data is unsupported. a. If a support request is raised for a test that intentionally used uncontrolled failover on production data, the customer will have to take responsibility for recovery steps using documentation without support assistance.
  2. This procedure requires the “Failover Planning Guide and Checklist” to be followed to maintain support as per the support contract for planned failovers. a. The Failover Planning Guide and checklist will be requested for validation from Superna support when any case is opened regarding failover.
note

Production SyncIQ policies should target and protect directories in Access Zones other than the EyeglassRunbookRobot Access Zone or the test DFS SyncIQ policy.

Initial Environment Setup

info

If you have already configured the Eyeglass RunbookRobot feature in your environment for Access Zone or DFS Continuous DR Testing, you may skip this initial environment setup section and proceed to the Verify Environment Setup section.

warning

Only one runbook robot is supported per Eyeglass instance.

Common Setup for Both DFS and Access Zone Failover

  1. Access Zone Creation:

    • On both Cluster1 and Cluster2, create an Access Zone with the name format: EyeglassRunbookRobot-XXXX (where XXXX is a string or number of your choice).
  2. IP Pool for SyncIQ Data Replication:

    • On both Cluster1 and Cluster2, create an IP pool dedicated to SyncIQ data replication.
      • Ensure the Replication IP pool is in the System Access Zone.
      • When configuring SyncIQ policies, select the option to "Run the policy only on nodes in the specified subnet and pool," and choose the dedicated SyncIQ IP pool.
  3. IP Pool Aliases:

    • On both Cluster1 and Cluster2, configure SmartConnect zone aliases for the IP pools:
      • Use the alias format igls-ignore-xxxx for the SyncIQ IP pool, where xxxx ensures the alias is unique across the infrastructure.
  4. Test SyncIQ Policy Creation:

    • Create a SyncIQ policy on Cluster1 with the name format: EyeglassRunbookRobot-yyyy, where yyyy is a string or number of your choice.
    • Note: Only one SyncIQ policy is allowed per EyeglassRunbookRobot-XXXX Access Zone.

Additional Setup for DFS

  1. Client Access IP Pool:
    • On both Cluster1 and Cluster2, create an IP pool for client access.
    • Configure the IP pool with a SmartConnect zone alias in the format: igls-ignore-xxxx (specific to DFS client access).

Additional Setup for Access Zone Failover

  1. Access Zone Failover Logic:
    • On both Cluster1 and Cluster2, configure the SmartConnect zone alias in the format: igls-aaaa-bbbb, where:
      • aaaa is the same string on both Cluster1 and Cluster2.
      • bbbb is another string to make the alias unique as a whole.

Network Mapping Example

  • SmartConnect Alias: igls-ignore-xxxx
  • Access Zone Alias: igls-aaaa-bbbb

Refer to the Eyeglass Zone Readiness section for further details on mapping pools from Cluster1 to Cluster2.

Verify Environment Setup

  1. Verify Production DFS Policies
    Ensure production DFS policies are configured with dual folder targets.
  2. Verify Test DFS Policy
    • Write data to the share included in the DFS namespace, protected by the EyeglassRunbookRobot DFS SyncIQ Policy, on a Windows client.
    • Confirm Cluster 1 contains the active folder target using the DFS tab in the folder properties in Windows Explorer.
  3. Verify Test Non-DFS Policy
    Perform a write test for the non-DFS policy in the Access Zone. This validates DNS resolution to the correct cluster.
  4. Access Zone Mode
    Verify Active Directory (AD) permissions by following the document:
    How to Validate AD Cluster Delegation is Ready for Failover and Fallback of SPNs.

Simulated Disaster Scenario - DFS Test SyncIQ Policy Failover

Follow these steps to simulate a DFS policy failover in uncontrolled mode, designed to replicate a disaster recovery (DR) event. This procedure assumes a DFS mode policy has been enabled inside the Runbook Robot Access Zone.

info
  • You can perform this test with or without production data failover during the same maintenance window.
  • Before beginning the steps for simulating a DFS Test SyncIQ policy failover, ensure you have completed the following sections:
    • Initial Environment Setup
    • Verify Environment Setup
    • Support Statement

Pre Simulated Disaster (DFS Test SyncIQ)

Cluster1 (Prod Cluster) Is Available: Controlled Failover

  1. Review all steps in Failover Planning with Superna. Completing this step is necessary to maintain support for this procedure.

  2. Perform a Microsoft DFS controlled failover for Production SyncIQ policies from Cluster1 to Cluster2 using Eyeglass.

    info

    The uncontrolled test policy will not be failed over during this step.

  3. Enable the Production SyncIQ mirror-policy job in the Jobs window on Eyeglass if it is in the USERDISABLED state after failover.

  4. Write data to production shares protected by the Production SyncIQ Policy through a DFS mount.

    • Confirm that the Cluster2 share path is the active target on the Windows Client after the controlled failover.
  5. Complete the Controlled Failover

    • Verify that the production data controlled failover is complete.
    • Refer to the sample DFS readiness in the DR Dashboard for confirmation.

Simulated Disaster (DFS Test SyncIQ)

Cluster1 (Prod Cluster) Becomes Unavailable: Uncontrolled Failover

  1. Simulate Cluster1 Failure

    • On the Cluster1 OneFS UI, remove the node interfaces from the dedicated IP pool used for client access to the EyeglassRunbookRobot-DFSzone Access Zone.
    • This action simulates a DNS response failure to the EyeglassRunbookRobot-DFSzone Access Zone without impacting SSIP or normal DNS operations.
    • At this point, name resolution is down, and NetBIOS sessions are disconnected from the EyeglassRunbookRobot-DFSzone Access Zone.
    info

    The DNS response failure ensures that the SmartConnect zone name does not resolve (e.g., a SERVFAIL error). This confirms the simulated disaster scenario.

  2. Set the Test Policy Schedule to Manual

    • On Eyeglass, set the schedule for the EyeglassRunbookRobot DFS Test Policy to manual. This prevents unnecessary policy execution if the source cluster is unavailable.
    • Ensure this step is complete before proceeding.
  3. Perform Uncontrolled Failover

    • Use Superna Eyeglass to perform an uncontrolled failover for the EyeglassRunbookRobot Test DFS SyncIQ Policy from Cluster1 to Cluster2.

    1. Open the Failover Wizard

      • Access the Failover Wizard in the DR Assistant interface.
      • Select "Microsoft DFS" as the failover type and choose the source cluster, such as ademola-s12.
      • Ensure "Controlled Failover" is unchecked to perform an uncontrolled failover.
      • Click Next to continue.
    2. Review Best Practices

      • Follow the suggested steps to optimize the failover:
        • Run the domain mark in advance to reduce failover time.
        • Increase SyncIQ worker threads to 10 or more for improved performance.
      • Click Next after reviewing the recommendations.
    3. Select the DFS Policy

      • Select the test DFS policy for failover, such as "EyeglassRunbookRobot-DFSzone."
      • Verify the "DR Failover Status" is "OK" and check the last successful readiness time.
      • Click Next to proceed.
    4. Validate Configuration

      • Confirm that Eyeglass has validated your configuration:
        • Ensure the DFS policy for failover, such as ademola-s12_EyeglassRunbookRobot-DFS, is ready.
        • Verify no disabled policies exist.
      • Check the box to acknowledge you have reviewed the "failover release notes."
      • Click Next to continue.
    5. Run the Failover

      • Review the summary details, including the selected DFS policy, source cluster, and target cluster.
      • Acknowledge the warnings about potential data loss and confirm you understand the implications of making the source cluster read-only and the target cluster read-write.
      • Check the box indicating your readiness to initiate failover.
      • Click Run Failover to start the process.
    6. Monitor the Failover Progress

      • Go to the "Running Failovers" section to track the progress.
      • Confirm the job status for "DFS Policy Failover: EyeglassRunbookRobot-DFSzone" shows as "FINISHED."
      • Verify the start time, finish time, and duration to ensure successful completion.
      • Click Logs to review details if needed.
    7. Wait Until Uncontrolled Failover Completes

      • Confirm the failover process is complete by checking the status in the "Running Failovers" section.
      • Ensure the job is marked "FINISHED," and there are no errors in the logs.
      • Validate that the DFS policy is operational on the target cluster.
  4. Test Data Access

    • Write data to the share protected by the EyeglassRunbookRobot Test DFS SyncIQ Policy using the DFS mount.
    • Confirm that the Cluster2 share path is now the active target on the Windows Client.
  5. Uncontrolled DFS Failover Is Complete

    • Verify that the simulated disaster scenario has been successfully executed and Cluster2 is fully active for the test policy.

Post Simulated Disaster (DFS Test SyncIQ)

Cluster1 Recovery Steps for DFS

Follow these steps to restore the uncontrolled policies to a working state. Production data is currently failed over to Cluster2 using an uncontrolled failover. Some users may choose to keep production on Cluster2 temporarily before planning a failback. The test policies can be recovered by following the steps below:

  1. Simulate Cluster1 Returning to Service

    1. Rename Shares:

      • On the Cluster1 OneFS UI, rename shares within the Test SyncIQ policy path to use the format igls-dfs-<sharename>.
      • Perform this step after the "uncontrolled failover" phase.
    2. Reconnect Node Interfaces:

      • On the Cluster1 OneFS UI, reconnect the previously removed node interfaces to the IP pool used for DFS client access.
    3. Run Resync Prep:

      • On the Cluster1 OneFS UI, execute the resync-prep command for the EyeglassRunbookRobot-DFS Test SyncIQ Policy (refer to EMC documentation).
      • Verify that the resync-prep process completes without errors before continuing.
    note

    Ensure the resync-prep process completes without errors before proceeding to the next steps.

  2. Verify SyncIQ Reports:

    • Check the SyncIQ reports tab on Cluster1 OneFS to ensure all steps were completed successfully.
  3. Perform Controlled Failback
    Follow these steps to perform a controlled failback for the EyeglassRunbookRobot-DFS Test SyncIQ mirror-policy:

    1. Open the Failover Wizard

      • Access the Failover Wizard in the DR Assistant interface.
      • Select "Microsoft DFS" as the failover type and choose the source cluster, such as ademola-d12.
      • Ensure the Controlled Failover box is checked for a controlled failback operation.
      • Click Next to continue.
    2. Review Failover Options

      • Verify the following failover options are enabled:
        • "Data Sync" to synchronize any data changes.
        • "Config Sync" to ensure configuration changes are applied.
        • "SyncIQ Resync Prep" to prepare the SyncIQ policies for the failback.
      • Leave the "Disable SyncIQ Jobs on Failover Target" option unchecked unless specifically required for the failback scenario.
    3. Select the DFS Policy

      • Select "EyeglassRunbookRobot-DFS" or the relevant DFS SyncIQ policy from the list.
      • Confirm the "DR Failover Status" is "OK" and check the last successful readiness time.
      • Click Next to proceed.
    4. Validate Configuration

      • Confirm that Eyeglass has validated your configuration:
        • Ensure the DFS SyncIQ policy for failback, such as ademola-d12_EyeglassRunbookRobot-DFS, is ready.
        • Verify no disabled policies exist.
      • Check the box to acknowledge you have reviewed the "failover release notes."
      • Click Next to continue.
    5. Run the Failback

      • Review the summary details, including the selected DFS SyncIQ policy, source cluster, and target cluster.
      • Acknowledge the warnings about potential data loss and confirm you understand the implications of making the source cluster read-only and the target cluster read-write.
      • Check the box indicating your readiness to initiate the controlled failback.
      • Click Run Failover to start the process.
    6. Monitor the Failback Progress

      • Go to the "Running Failovers" section to track the progress.
      • Confirm the job status for "DFS Policy Failback: EyeglassRunbookRobot-DFS" shows as "FINISHED."
      • Verify the start time, finish time, and duration to ensure successful completion.
      • Click Logs to review details if needed.
    7. Wait Until Controlled Failback Completes

      • Confirm the failback process is complete by checking the status in the "Running Failovers" section.
      • Ensure the job is marked "FINISHED," and there are no errors in the logs.
      • Validate that the DFS SyncIQ policy is operational on the original source cluster.
  4. Write Data to the Test Policy Share

    • Write data to the share protected by the EyeglassRunbookRobot DFS Test SyncIQ Policy from a DFS mount.
    • Confirm that the Cluster1 share path is the active target on the Windows Client after the controlled failback.
  5. Perform Production Controlled Failback

    • Perform a Microsoft DFS-type controlled failback of all production SyncIQ mirror-policies from Cluster2 to Cluster1 using Superna Eyeglass.
  6. Validate Production Data

    • Write data to the shares protected by the Production SyncIQ Policy from a DFS mount.
    • Confirm that the Cluster1 share path is the active target on the Windows Client after the controlled failback.

Simulated Disaster Scenario - Eyeglass Runbook Robot access zone failover

Follow these steps to simulate an Access Zone failover in uncontrolled mode, designed to replicate a disaster recovery (DR) event. This procedure assumes dual delegation is implemented and the Runbook Robot Access Zone is fully operational.

info
  • You can perform this test with or without a production data failover during the same maintenance window.

  • Before beginning the steps for simulating an Access Zone failover, ensure you have completed the following sections:

    • Initial Environment Setup
    • Verify Environment Setup
    • Support Statement

Pre Simulated Disaster (Eyeglass Runbook)

Cluster1 (prod) is available - Controlled Failover

  1. Review every step in the Failover Planning with Superna before you begin. You must complete this to maintain support for this procedure. Refer to the support statement in the guide.

  2. Use Eyeglass to perform a controlled failover of your Production Access Zone(s) from Cluster1 to Cluster2.

  3. On Eyeglass, enable each Production SyncIQ policy job for your Production Access Zone(s) if they are in the USERDISABLED state.

    tip

    Consult the Failover Planning with Superna to maintain support.

  4. Write data to shares protected by Production SyncIQ policies from the DFS mount. Confirm that the Cluster2 share path is the active target on a Windows client after the controlled failover of your Production Access Zone(s).

  5. Do not proceed until you confirm that the failover is successful.

  6. Do not fail over the EyeglassRunbookRobot-SMBZone Access Zone on Cluster1.

  7. The procedure is complete. Do not continue to the next steps until the controlled failover of your Production Access Zone(s) is complete.

Simulated Disaster (Eyeglass Runbook)

Cluster1 Unavailability and EyeglassRunbookRobot-SMBzone Access Zone Impact

  1. Simulate Cluster1 Failure

    • Review the Cluster1 to Cluster2 IP pool mapping before initiating the disaster simulation.
    • On the Cluster1 OneFS UI, disconnect node interfaces from the dedicated IP pool used for client access. This action tests data on the EyeglassRunbookRobot-SMBzone Access Zone. If configured properly, the SMB target path on Cluster1 will fail when node interfaces are removed, disconnecting SMB sessions and causing SMB mount failure.
  2. DNS Simulation

    • Removing Cluster1 IPs from the pool simulates a DNS response failure without impacting SSIP or normal DNS operations. At this stage, name resolution is down, and NetBIOS sessions to Cluster1 EyeglassRunbookRobot-SMBzone Access Zone are disconnected.

      ademola-igls4:/home/admin # nslookup robot-smb.ademola-src1-smb.ad2.test
      Server: 127.0.0.1
      Address: 127.0.0.1#53

      ** server can't find robot-smb.ademola-src1-smb.ad2.test: SERVFAIL
    • During this step, the simulation of Cluster1 IP removal results in a DNS response failure, where attempts to resolve the SmartConnect name return a SERVFAIL error. This confirms that name resolution to the SmartConnect zone is down. At this stage, we have successfully simulated a disaster scenario, as Cluster1 EyeglassRunbookRobot-SMBzone Access Zone SmartConnect zone name resolution is failing, and no shares can be accessed on Cluster1.

  3. Adjust EyeglassRunbookRobot-SMB Test Policy

    • Set the schedule for the EyeglassRunbookRobot-SMB Test policy on Cluster1 to manual. This ensures policies will not run unnecessarily if the source cluster is unavailable.
    note

    In an actual disaster, the source cluster is assumed to be unreachable. Record the current schedule for reapplication after completing the procedure.

  4. Execute Failover

    1. Open the Failover Wizard

      • Access the Failover Wizard in the DR Assistant interface.
      • Select "Access Zone" as the failover type and choose the source cluster, such as ademola-s12.
      • Ensure "Controlled Failover" is unchecked to perform an uncontrolled failover.
      • Click Next to continue.
    2. Review Best Practices

      • Follow the suggested steps to optimize the failover:
        • Run the domain mark in advance to reduce failover time.
        • Increase SyncIQ worker threads to 10 or more for improved performance.
      • Click Next after reviewing the recommendations.
    3. Select the Access Zone

      • Select "EyeglassRunbookRobot-SMBzone" from the list of available Access Zones.
      • Verify the "DR Failover Status" is "OK" and check the last successful readiness time.
      • Click Next to proceed.
    4. Validate Configuration

      • Confirm that Eyeglass has validated your configuration:
        • Ensure the Access Zone policy for failover, such as ademola-s12_EyeglassRunbookRobot-SMB, is ready.
        • Verify no disabled policies exist.
      • Check the box to acknowledge you have reviewed the "failover release notes."
      • Click Next to continue.
    5. Run the Failover

      • Review the summary details, including the selected Access Zone, source cluster, and target cluster.
      • Acknowledge the warnings about potential data loss and confirm you understand the implications of making the source cluster read-only and the target cluster read-write.
      • Check the box indicating your readiness to initiate failover.
      • Click Run Failover to start the process.
    6. Monitor the Failover Progress

      • Go to the "Running Failovers" section to track the progress.
      • Confirm the job status for "Access Zone Failover: EyeglassRunbookRobot-SMBzone" shows as "FINISHED."
      • Verify the start time, finish time, and duration to ensure successful completion.
      • Click Logs to review details if needed.
    7. Wait Until Uncontrolled Failover Completes

      • Confirm the failover process is complete by checking the status in the "Running Failovers" section.
      • Ensure the job is marked "FINISHED," and there are no errors in the logs.
      • Validate that the Access Zone is operational on the target cluster.
  5. Validate the Failover

    • Verify that SPNs are correctly failed over in Active Directory using ADSI Edit.
    • Test DNS resolution with nslookup to confirm it now points to Cluster2.
    • Address any issues with SmartConnect name resolution before proceeding.
  6. Test Client Access

    • Reboot the client machine used to validate the share before the disaster to ensure no NetBIOS session persists with Cluster1.
    • Remount the share and confirm data access through SMB.
    • Write data to the share protected by the EyeglassRunbookRobot-SMB Test SyncIQ Policy.
  7. Complete the Procedure

    • Confirm that the uncontrolled failover process is complete.

Post Simulated Disaster (Eyeglass Runbook)

Cluster1 (prod) Becomes Available: EyeglassRunbookRobot-SMBzone Access Zone

  1. Simulate Cluster1 Availability

    • Review the Cluster2 to Cluster1 IP pool mapping for the EyeglassRunbookRobot-SMBzone Access Zone.
    • Ensure the previously removed node interfaces have not yet been reconnected.
  2. Update SmartConnect Name

    warning

    This step is required before reconnecting the previously removed node interfaces. Edit the SmartConnect name for the EyeglassRunbookRobot-SMBzone Access Zone on the Cluster1 OneFS UI. Add the prefix igls-original to the current SmartConnect name to ensure proper configuration.

    • On the Cluster1 OneFS UI, edit the SmartConnect name for the EyeglassRunbookRobot-SMBzone Access Zone.
    • Add the prefix igls-original to the current SmartConnect name.
  3. Reconnect Node Interfaces

    • Reconnect the previously removed node interfaces to the Cluster1 IP pool used for client access.
  4. Run Resync Prep

    • On the Cluster1 OneFS UI, run resync-prep for the EyeglassRunbookRobot-SMB Test SyncIQ Policy.
    • Verify the resync-prep completes without errors. Resolve any issues before continuing.
  5. Verify Eyeglass Job States

    • Confirm the following:
      • The Cluster2 mirror policy is "Enabled."
      • The Cluster1 policy is in a "Disabled" state.
    • Allow Eyeglass Configuration Data Replication to run at least once to ensure updates.
  6. Confirm Job Completion

    • Ensure Configuration Data Replication has completed successfully by checking the running jobs window.
  7. Run Zone Failover Readiness Jobs

    • From the Eyeglass Jobs window, select "Run Now" from the bulk action menu to execute the Zone Failover Readiness Audit.
  8. Verify Zone Readiness

    • Use the Eyeglass DR Dashboard to confirm the readiness status for the EyeglassRunbookRobot-SMBzone Access Zone is "Good."
  9. Perform Controlled Failback

    • Perform a controlled failback of the EyeglassRunbookRobot-SMBzone Access Zone from Cluster2 to Cluster1.

    1. Open the Failover Wizard

      • Access the Failover Wizard in the DR Assistant interface.
      • Select "Access Zone" as the failover type and choose the source cluster, such as ademola-d12.
      • Ensure the Controlled Failover box is checked for a controlled failback operation.
      • Click Next to continue.
    2. Review Failover Options

      • Verify the following failover options are enabled:
      • "Data Sync" to synchronize any data changes.
      • "Config Sync" to ensure configuration changes are applied.
      • "SyncIQ Resync Prep" to prepare the SyncIQ policies for the failback.
      • Leave the "Disable SyncIQ Jobs on Failover Target" option unchecked unless specifically required for the failback scenario.
    3. Select the Access Zone

      • Select "EyeglassRunbookRobot-SMBzone" or the relevant Access Zone from the list.
      • Confirm the "DR Failover Status" is "OK" and check the last successful readiness time.
      • Click Next to proceed.
    4. Validate Configuration

      • Confirm that Eyeglass has validated your configuration:
      • Ensure the Access Zone policy for failback, such as ademola-d12_EyeglassRunbookRobot-SMB, is ready.
      • Verify no disabled policies exist.
      • Check the box to acknowledge you have reviewed the "failover release notes."
      • Click Next to continue.
    5. Run the Failback

      • Review the summary details, including the selected Access Zone, source cluster, and target cluster.
      • Acknowledge the warnings about potential data loss and confirm you understand the implications of making the source cluster read-only and the target cluster read-write.
      • Check the box indicating your readiness to initiate the controlled failback.
      • Click Run Failover to start the process.
    6. Monitor the Failback Progress

      • Go to the "Running Failovers" section to track the progress.
      • Confirm the job status for "Access Zone Failback: EyeglassRunbookRobot-SMBzone" shows as "FINISHED."
      • Verify the start time, finish time, and duration to ensure successful completion.
      • Click Logs to review details if needed.
    7. Wait Until Controlled Failback Completes

      • Confirm the failback process is complete by checking the status in the "Running Failovers" section.
      • Ensure the job is marked "FINISHED," and there are no errors in the logs.
      • Validate that the Access Zone is operational on the original source cluster.
  10. Wait for Failback Completion

    • Allow the controlled failback process to finish.
  11. Validation

    • Verify SPNs have failed over correctly in Active Directory using ADSI Edit.
    • Use nslookup to confirm DNS resolution now points to Cluster1.
    • Address any SmartConnect name resolution issues.
  12. Test Client Access

    • Reboot the client machine used for pre-disaster validation to clear any lingering NetBIOS sessions with Cluster2.
    • Mount the share and write data to the share protected by the EyeglassRunbookRobot-SMB Test SyncIQ Policy.
  13. Complete the Procedure

    • The controlled failback process is now complete.
    • For production data failback, refer to the planning guide to ensure support compliance.

Next Steps

You may now explore the Failover Operations section of the documentation for more information about failover.