Failover Validations
Introduction
The Failover Readiness Validations DR Dashboard serves as a critical tool to assess and ensure the preparedness of disaster recovery (DR). This document provides an overview of the validation checks and metrics used in failover. highlights key areas such as system replication status, readiness of backup, and synchronization of vital configurations.
Monitor DR Readiness
To assess and manage your Disaster Recovery (DR) readiness, navigate to the DR Dashboard window. This tool provides an overview of key failover readiness metrics across various policies and zones.
- Zone Readiness: Shows the status of your Access Zone Failover readiness.
- IP Pool Readiness: Indicates the readiness for IP Pool Failover.
- DFS Readiness: Shows your readiness for SyncIQ Policy in DFS Mode Failover.
- Policy Readiness: Displays the status of your SyncIQ Policy Failover readiness.
- Protected Path Readiness: Shows the status of Protected Path readiness for failover.
- LiveOps DR Testing: Displays the status of DR testing.
SyncIQ Audit Monitor - Verify Data is Replicating
The SyncIQ Audit Monitor verifies data replication by adding test data with timestamps to the source cluster and comparing it after SyncIQ runs. It checks for matching timestamps on the target cluster, providing the highest level of confidence in your offsite data. This job is disabled by default and requires configuration.
Requirements
- Version 2.5.7 or later
How It Works
- A test file is created in a hidden directory with a timestamp using the cluster file API.
- SyncIQ policies are monitored when they run successfully.
- After SyncIQ runs, the remote DR cluster is checked using the file API to verify the timestamp matches the source cluster.
- If the validation fails, an alarm is triggered, flagging a data sync issue.
- If the validation passes, the job completes without any alerts.
- A successful job monitor is set up on a single SyncIQ policy configured for monitoring.
Configuration: Monitoring SyncIQ Policy Data Integrity
To monitor a policy’s data integrity, the following steps must be completed for each relevant SyncIQ policy. You can apply these steps to any policy, but only the policies that require monitoring need to follow these steps. All other steps are automated after completing this setup on the policy.
-
Log in to the Source Cluster as Root
ssh root@<source-cluster-ip>
-
Create a Hidden Folder in the SyncIQ Policy Path
For example, if the policy path is
/ifs/data/userdata/smbdata
, create a hidden folder:mkdir -p /ifs/data/userdata/smbdata/.iglssynciqmonitor
noteThe folder name starts with a
.
to make it hidden. -
Change Ownership of the Folder
Set ownership to allow the Eyeglass service account to create a test file in this folder:
chown eyeglass:wheel /ifs/data/userdata/smbdata/.iglssynciqmonitor
-
Verify the SyncIQ Monitor Job is Enabled
Log in to Eyeglass as an admin:
ssh admin@<eyeglass-ip>
sudo -sThen, enable and verify the SyncIQ Monitor Task:
igls admin schedules set --id SyncMonitorTask --enabled true
igls admin schedulesNote: Verify that the job is enabled and the default schedule is set to run hourly.
-
Alarm Notification
If the monitor detects an issue with data synchronization, an alarm will be raised automatically.
Validations DFS and Policy
Policy Readiness and DFS Readiness
The DR Failover Status*section on the Policy Readiness or DFS Readiness tabs offers a quick way to assess your Disaster Recovery (DR) status. It supports both SyncIQ Policy Failover in non-DFS mode and SyncIQ Policy Failover in DFS mode.
Click the DR Failover Status to see details for each validation criterion for the selected job.
Column | Description | Notes |
---|---|---|
Name | Name of the Eyeglass configuration Replication Job | Eyeglass automatically creates this replication job for each SyncIQ Policy. The job uses the same name prefixed with the PowerScale OneFS Cluster name. Quota jobs are suffixed with "quotas". |
SyncIQ Policy | Name of the SyncIQ Policy associated with the Eyeglass Job | The Eyeglass job matches the SyncIQ Policy name. |
Source | The PowerScale OneFS cluster that is the source configured in the SyncIQ Policy | The Eyeglass job shares the same source. |
Destination | The PowerScale OneFS cluster that is the target configured in the SyncIQ Policy | The Eyeglass job shares the same target. |
DR Failover Status | Status calculated by Eyeglass based on failover validation criteria for the Policy or DFS Failover mode | This column shows the overall status. Click the link to see individual validation criteria. The "Failed Over" option is shown for any read-only SyncIQ policy. |
The DR Failover Status will be one of the following:
Status | Failover Impact |
---|---|
OK | Able to Failover. |
WARNING | Warning state does NOT block failover. |
ERROR | Error state BLOCKS failover. |
DISABLED | Disabled state DOES block failover. |
FAILED OVER | Failed Over state DOES block failover. |
While failover is not blocked, the issue(s) causing this Warning may cause failover to fail. Recommendation is to resolve issues first.
The DR Failover Status is based on Status for each of the following areas for Policy and DFS Failover:
Criteria | Description |
---|---|
PowerScale OneFS SyncIQ Readiness | Have the SyncIQ policies in the Access Zone been successfully run and are they in a supported configuration? |
Eyeglass Configuration Replication Readiness | Have the Eyeglass Configuration Replication jobs in the Access Zone been successfully run to sync configuration data for all policies in the Access Zone? |
Zone Configuration Replication Readiness | Has the Eyeglass Zone Configuration Replication job been successfully run to create target cluster Access Zones that don't already exist? |
Date-Time Validation | Are the date-time differences between nodes and between Eyeglass and the clusters within an acceptable range to not affect SyncIQ operations? |
Additional information for Policy / DFS Readiness criteria is provided in the following sections.
Policy/DFS Readiness - PowerScale OneFS SyncIQ Readiness
The PowerScale OneFS SyncIQ Readiness criteria are used to validate that the SyncIQ policy has been successfully run and that it is in a supported configuration. The following checks are performed for each SyncIQ policy:
- SyncIQ Policy Status
- Quota Domain Validation
- SyncIQ File Pattern Validation
- Corrupt Failover Snapshots
- Policy Local Target Validation
- Policy Enabling
Validations in Release 2.0 and later
SyncIQ Policy | Notes |
---|---|
Previous Failed DFS failover share prefix | This detects any prefixed DFS shares on the active cluster. The active cluster should not have any prefixed share on the SyncIQ policy. This validation will indicate if prefixed shares are detected that should be cleaned up prior to any failover. |
Domain Mark Validation | Domain Mark validation applies to all failover types. DR Status is Warning when the validation fails. Failover should not be started with this validation warning. |
Policy Source Nodes Restriction | Validate PowerScale OneFS best practices that recommend that SyncIQ Policies utilize the Restrict Source Nodes option to control which nodes replicate between clusters. |
SyncIQ Policy Status Validations | DR Status is "OK" when all the following conditions are met: - Your SyncIQ Policy is enabled. - Your SyncIQ Policy last state was finished or needs attention. DR Status is "Warning" when: - SyncIQ Policy has a last state that was not successful. - SyncIQ Policy has a last state that was paused or canceled. - SyncIQ Policy does not have a last state (has never been run). - SyncIQ Policy has Excluded Directories and/or Included Directories configured. IMPORTANT: SyncIQ Policy in Warning state MAY NOT be able to be run by Eyeglass assisted failover depending on its current status. Example 1: SyncIQ Policy has an error state. If it cannot be run from the PowerScale OneFS, it will also not be able to run from Eyeglass. Example 2: SyncIQ Policy is paused. Eyeglass failover cannot RESUME a paused SyncIQ Policy - this must be resumed from PowerScale OneFS. You must investigate these errors and understand their impact on your failover solution. DR Status is "Disabled" when: - Eyeglass configuration replication job is disabled. - Or the SyncIQ policy in PowerScale OneFS is disabled. |
Quota Domain Validation | Detects a quota with needs scanning flag set. This flag will fail SyncIQ steps (run policy, Make Writable, and Resync prep) for any policy with a quota that has not been scanned and is missing a quota domain. Failover should not be started with this validation warning. Quota scan job should be run manually or verify if a quota scan job is in progress. |
SyncIQ File Pattern Validation | SyncIQ policies with file patterns set cannot be failed back, and any files that do not match the file pattern will be read-only after failover. This file pattern is not failed over by Resync prep to mirror policies and Eyeglass does not support copying file access patterns to mirror policies. This validation will show a warning for any policy with a file pattern set. The file pattern should be removed from the policy to clear the warning. |
Corrupt Failover Snapshots | Validate that the Target Cluster does not have an existing SIQ-<policyID>-restore-new or SIQ-<policyID>-restore-latest snapshot from previous failovers/synciq jobs for the Policy. |
Policy Local Target Validation: Duplicate SyncIQ Local Targets | Validate that there is only 1 Local Target per SyncIQ policy. |
Policy Local Target Validation: Target Writes Disabled | Validate that the target folder of SyncIQ policy has writes disabled. |
Policy Enabling | Validate that the SyncIQ policy is enabled in PowerScale OneFS. If disabled, overall DR Status is Disabled. |
Policy/DFS Readiness - Eyeglass Configuration Replication Readiness
The Eyeglass Configuration Replication Readiness criteria is used to validate that the Eyeglass Configuration Replication job related to the SyncIQ Policy has been successfully run to sync the related configuration data.
Policy/DFS Readiness - Date-Time Validation
Date-Time Validation ensures that the time difference between cluster nodes and between clusters and Eyeglass remains within a range that does not impact SyncIQ operations. Large time differences between the Eyeglass VM and the cluster, especially when API calls are involved, can cause issues such as failed re-sync preparation. For example, if a cluster node's timestamp for a step is earlier than the re-sync prep request, the process may fail. Differences in completion times between clusters can also lead to failures if the time gap exceeds the duration required for the re-sync prep command to complete.
Timing differences that cause resync failover commands to fail are rare and difficult to detect manually. However, starting with release 1.8, Eyeglass can identify this condition. It is a best practice to configure NTP on both the clusters and the Eyeglass appliance to ensure synchronized time. This setup enables failover logs to capture SyncIQ reports for each step and append them to the failover log. As a result, debugging multi-step, multi-cluster failovers becomes more straightforward.
For each Cluster the following checks are done:
Date-Time Validation | Notes: |
---|---|
Nodes Validation | Validates that the maximum time difference between the nodes of a cluster is less than the time required for the cluster node time request made by Eyeglass to complete. DR Status Validation results in a warning if it fails. You may generally proceed with this warning. |
Eyeglass & Clusters Validation | Applicable only if Nodes Validation is successful. This step validates that the earliest node time for a cluster and the Eyeglass appliance time are less than the time required for the cluster node time request made by Eyeglass to complete, plus a default additional skew factor (default: 1s). This step is executed only if Nodes Validation is successful. DR Status Validation results in a warning if it fails. You may generally proceed with this warning. |
DR Dashboard Job Details
Each Policy or DFS Job can be expanded in the DR Dashboard Policy Readiness or DFS Readiness view to see Job Details:
Column | Description | Notes |
---|---|---|
SyncIQ Policy | All information in the SyncIQ Policy details comes from the PowerScale OneFS Cluster itself. | If empty, the job is a custom Eyeglass job not associated with a SyncIQ Policy. |
Job Name | Name of the SyncIQ Policy. | The same name as on the PowerScale OneFS Cluster. |
Last Started | Date/time when the last SyncIQ Policy job started. | Retrieved from SyncIQ Policy details on the PowerScale OneFS Cluster. |
Last Success | Date/time when the SyncIQ Policy last successfully ran. | Retrieved from SyncIQ Policy details on the PowerScale OneFS Cluster. |
Last Job State | Status of the last SyncIQ Policy job. | Retrieved from SyncIQ Policy details on the PowerScale OneFS Cluster. Used to determine Overall DR Status. |
Enabled | Indicates if the PowerScale OneFS SyncIQ policy is enabled. | |
Eyeglass Configuration Replication | All information in the Eyeglass Configuration Replication details comes from Eyeglass. | |
Job Name | Name of the Eyeglass Configuration Replication Job. | Automatically created for each SyncIQ Policy detected with the same name and prefixed with the PowerScale OneFS Cluster name. Quota Jobs also suffixed with "quotas". |
Last Run | Date/time when the last Eyeglass Configuration Replication job started. | Used to determine Overall DR Status. |
Last Success | Date/time when Eyeglass Configuration Replication Job last successfully ran. | Used to determine Overall DR Status. |
Audit Status | Status of the Eyeglass Configuration Replication Job Audit. | After the Eyeglass Configuration Replication Job is complete, an audit compares source and destination configurations to ensure they are identical. |
Enabled | Indicates if the Eyeglass Configuration Replication Job is enabled. | |
Last Successful Readiness Check | Date/time when Eyeglass last successfully ran the Readiness Check Job. |
Validations Access Zone and IP Pool
Zone and IP Pool Readiness
The Zone Readiness DR Failover Status allows you to quickly assess your Disaster Recovery (DR) status for an Access Zone failover. The Zone Readiness check is conducted in both directions of a replicating PowerScale OneFS cluster pair, ensuring you have the status for both failover and failback.
The Zone Readiness Status will be one of the following:
Status | Description |
---|---|
OK | All Required and Recommended conditions that are validated by Eyeglass software have been met. |
WARNING | One or more Recommended conditions that are validated by Eyeglass software have not been met. Warning state does NOT block failover. |
Review the Access Zone Failover Guide Requirements/Recommendations to determine the impact of unmet recommendations. | |
ERROR | One or more of the Required conditions that are validated by Eyeglass software have not been met. Error state DOES block failover. |
Review the Access Zone Failover Guide Requirements/Recommendations to determine resolution for these error conditions. | |
FAILED OVER | This Access Zone on this cluster has been failed over. You will be blocked from initiating failover for this Access Zone on this Cluster. |
Not all conditions are validated by Eyeglass software. Please refer to the Access Zone Failover Guide Requirements/Recommendations for a complete list of requirements and recommendations.
-
If the target cluster pool that has the Eyeglass hint mapping for failover does not have a SmartConnect Zone defined:
- On failover, the Access Zone will be in a Warning state due to SPN inconsistencies.
- On failback, the FAILED OVER status will not be displayed.
-
If there is no Eyeglass Configuration Replication Job enabled in an Access Zone, there will be no entry in the Zone Readiness table for that Access Zone.
-
Until Configuration Replication runs, Policy Readiness for a policy in the Access Zone will be in an Error state.
-
The DR Failover Status is based on the status of each of the following areas for Access Zone failover:
The DR Failover Status is based on the status of each of the following areas for Access Zone failover:
Area | Description |
---|---|
PowerScale OneFS SyncIQ Readiness | Have the SyncIQ policies in the Access Zone been successfully run, and are they in a supported configuration? |
Eyeglass Configuration Replication Readiness | Have the Eyeglass Configuration Replication jobs in the Access Zone successfully run to sync configuration data for all policies that are members of the Access Zone? |
SPN Readiness | Is Active Directory delegation completed for cluster machine accounts to detect missing SPNs and remediate existing and newly created SmartConnect Zones, including both short and long SPNs for cluster Active Directory machine accounts? |
SmartConnect Zone Failover Mapping Readiness | Validation that confirms all IP pools in the Access Zone have an Eyeglass hint (SmartConnect alias using igls syntax). Each SmartConnect Zone name associated with the IP pools must be mapped to a target cluster IP pool prior to any failover. This ensures that all SmartConnect names used to access the source cluster data will failover to a target cluster IP pool. It is best practice and a requirement to create IP pools in matched pairs on both the source and destination clusters. |
SmartConnect/IP Pool Readiness | SmartConnect/IP Pool Failover Readiness provides the status of whether the IP pool is ready for failover or has already failed over. It also verifies each IP pool has a SmartConnect name applied. This validation is used for IP pool-based failover in addition to Access Zone failover, where all pools must have a SmartConnect name defined. |
Zone Configuration Replication Readiness | Has the Eyeglass Zone Configuration Replication job successfully run to create target cluster Access Zones that don't already exist for configuration sync completeness? |
Target Cluster Reachability | Is Eyeglass able to connect to the Failover Target Cluster using API? |
Date-Time Validation | Are the date-time differences between nodes, and between Eyeglass and the clusters, within an acceptable range that will not affect SyncIQ operations? |
Zone Path Validation | Zone Path Validation provides the status of whether Access Zones have colliding paths. Status of OK indicates that the Access Zone paths have no conflicts. Status of ERROR indicates that this Access Zone collides with another Access Zone's path. |
FQDN Alias Validation | If a cluster was added to Eyeglass with FQDN SmartConnect name for management, this SmartConnect zone must have an igls-ignore hint applied to avoid a failover impacting Eyeglass access. An ERROR means no igls-ignore hint was found, while OK means igls-ignore hint was found. |
By default, the Failover Readiness job, which populates this information, is disabled. Instructions to enable this job can be found in the Eyeglass PowerScale OneFS Edition Administration Guide.
If there are no Eyeglass Configuration Replication Jobs enabled, there is no Failover Readiness Job.
Preparation and planning instructions for Zone Readiness can be found in the Access Zone Failover Guide:
Zone and IP Pool Readiness - PowerScale OneFS SyncIQ Readiness
The PowerScale OneFS SyncIQ Readiness criteria are used to validate that the SyncIQ policies in the Access Zone have been successfully run and are in a supported configuration. You will find one entry per SyncIQ Policy in the Access Zone. For each SyncIQ Policy, the following checks are performed:
Validations in Release 2.0 - later
SyncIQ Policy Check | Notes |
---|---|
Policy Hot/Hot Validation | For Hot-Hot (Active-Active data) replication topology, validate that there is a dedicated Access Zone for each replication direction. |
Policy Zone Path Check | Validate that the SyncIQ Policy's source root and target directories are at or below the Access Zone Base Directory. |
- Policy Source Path Check | |
- Policy Target Path Check | |
Policy Source Nodes Restriction | Validate PowerScale OneFS best practices that recommend SyncIQ Policies utilize the Restrict Source Nodes option to control which nodes replicate between clusters. |
Policy Hostname Validation | Validate that the SyncIQ Policy target host hostname is associated with a subnet pool that is not going to be failed over. |
Corrupt Failover Snapshots | Validate that the Target Cluster does not have an existing SIQ-<policyID>-restore-new or SIQ-<policyID>-restore-latest snapshot from previous failovers/syncIQ jobs for the Policy. |
System Zone Config Restriction | Validate that all shares, exports, and aliases have been created in the Access Zone that is being failed over. It is not supported to have shares, exports, and aliases with a path that is outside (higher in the file system) than the Access Zone base path. |
Policy Enabling | Validate that the SyncIQ Policy is enabled in PowerScale OneFS. |
Quota Domain Validation | Detects a quota with a needs scanning flag set. This flag will fail SyncIQ steps (run policy, Make Writable, and Resync prep) for any policy with a quota that has not been scanned and is missing a quota domain. Failover should not be started with this validation warning. Quota scan job should be run manually or verified if a quota scan job is in progress. |
SyncIQ File Pattern Validation | SyncIQ policies with file patterns set cannot be failed back, and any files that do not match the file pattern will be read-only after failover. This file pattern is not failed over by Re-sync prep to mirror policies, and Eyeglass does not support copying file access patterns to mirror policies. This setting should not be used for DR purposes. This validation will show a warning for any policy with a file pattern set. The file pattern should be removed from the policy to clear the warning. |
Policy Status | Validate that the SyncIQ Policy is not in an error state in PowerScale OneFS. |
Policy Local Target Validation | Validate that there is only one Local Target per SyncIQ policy. |
Duplicate SyncIQ Local Targets | Validates that there is no duplicate policy local targets found |
Target Writes Disabled | Validate that the target folder of SyncIQ policy has writes disabled. |
Zone and IP Pool Readiness - Eyeglass Configuration Replication Readiness
The Eyeglass Configuration Replication Readiness criteria are used to validate that the Eyeglass Configuration Replication jobs in the Access Zone have been successfully run, to sync configuration data for all policy members of the Access Zone. For each Eyeglass Configuration Replication Job in the Access Zone, the following check is performed:
With both enabled and disabled Eyeglass Configuration Jobs in the Access Zone, the Eyeglass Configuration Replication Readiness validation will only display status for the Enabled jobs.
Job Name | Description |
---|---|
Eyeglass Configuration Replication Job Name | Validate that the Eyeglass Configuration Replication Job is not in the ERROR state. This ensures that configuration sync is functioning correctly for the Access Zone. |
With both enabled and disabled Eyeglass Configuration Jobs in the Access Zone, the Eyeglass Configuration Replication Readiness validation will only display status for the Enabled jobs.
Zone and IP Pool Readiness - SPN Readiness
The SPN Readiness validation ensures that Service Principal Names (SPNs) are correctly configured for failover and failback in PowerScale OneFS environments. This includes checking SPN case sensitivity, syntax correctness, and automatic insertion.
Prerequisites: Eyeglass version 2.5.6 or later for enhanced SPN management and failover.
The SPN Readiness criteria is used to:
-
Detect missing SPNs and insert them into AD based on PowerScale OneFS's list of missing SPNs. Requires AD Delegation step to be completed to support auto-insert feature.
-
Remediate existing and newly created SmartConnect Zones as short and long SPNs created for each cluster Active Directory machine account.
-
(2.5.6 or later releases) Checks the case of the SPN in Active Directory versus the SmartConnect zone name case.
-
note
SPNs are case-sensitive and must match the case of the cluster SmartConnect name or alias. For example,
HOST\Data.example.com
andHOST\data.example.com
are different and must match for correct Kerberos authentication. Failover requires the case to match the PowerScale OneFS configuration. This validation will detect incorrect cases.
-
-
(2.5.6 or later releases) Checks if syntax is correct in AD (i.e.
host\xxxx
is lowercase and not the correct syntax). The service class must be uppercase,HOST\xxxx
(i.e.,host\xxxx
is invalid) for failover and authentication. -
(2.5.6 or later releases) Supports additional service classes for custom SPN insert into AD as well as failover support. The following SPNs are supported: NFS, HDFS, WEB, and any other custom SPN required for failover and automatic insertion.
- See the Access Zone Failover configuration guide on how to enable custom SPNs.
This check is done for each domain for which each cluster is a member.
For cases where the PowerScale OneFS Cluster is not joined to Active Directory, the SPN Readiness will show the following:
- For PowerScale OneFS 7.2, the SPN Readiness check is displayed with the message: "Cannot determine SPNs."
- For PowerScale OneFS 8, the SPN Readiness check is not displayed in the Zone Readiness window.
2.5.6 or later releases:
Each SPN that should be associated with the AD cluster computer object, based on the SmartConnect names or aliases, is displayed and marked as green "OK" if it matches the correct case and SPN syntax. A warning will be shown for each SPN detected with incorrect case or syntax issues.
Example errors displayed when selecting a warning SPN entry in the validation UI:
-
Warning for lowercase SPN
host/xxxxx
- Not a valid SPN. The service class in the SPN definition should be HOST.
-
Warning for incorrect case in the SmartConnect name or alias
- Not a valid SPN. SPN entries are case-sensitive and should match the case used on the SmartConnect name.
Zone and IP Pool Readiness - SmartConnect Zone Failover Mapping Readiness
The SmartConnect Zone Failover Mapping Readiness criteria validate that the SmartConnect Zone alias hints have been created between source and target cluster subnet IP pools. This check is performed for each subnet:pool in the Access Zone.
Use the Zone Readiness View Mapping feature to display pools in the Access Zone and how they have been mapped using the SmartConnect Zone Alias hints.
Zone and IP Pool Readiness - View Mapping
Use the Zone Readiness View Mapping link to display the subnet:pool mappings with configured hints for the Access Zone.
Zone and IP Pool Readiness - Zone Configuration Replication Readiness
The Zone Configuration Replication Readiness criteria validate that the Zone Configuration Replication jobs in the Access Zone have been successfully run to create target cluster Access Zones that do not already exist, ensuring configuration sync completeness.
Zone and IP Pool Readiness - Target Cluster Reachability
The Target Cluster Reachability criteria validate that Eyeglass is able to connect to the Failover Target Cluster using the PowerScale OneFS API.
Zone and IP Pool Readiness - Date-Time Validation
The Date-Time Validation ensures that the time difference between the cluster nodes, and between clusters and Eyeglass, is within an acceptable range that will not affect SyncIQ operations. SyncIQ commands, such as re-sync prep, can fail if the time between cluster nodes exceeds the time difference between the Eyeglass VM and the cluster, due to latency in issuing the API call. This scenario can occur when a node returns a timestamp for a step status message that is earlier than the start of the re-sync prep request. API calls can return different completion times between clusters, and significant differences can cause re-sync prep failover commands to fail if the time difference between Eyeglass and the source cluster is greater than the time it takes for the re-sync prep command to complete.
This condition, in which timing differences cause resync failover commands to fail, is rare and hard to detect manually. In release 1.8, Eyeglass can detect this condition. It is best practice to use NTP on clusters and the Eyeglass appliance. This ensures failover logs, along with the new feature in release 1.8 or later, can collect SyncIQ reports for each step and append them to the failover log. This process simplifies debugging multi-step, multi-cluster failovers and requires time to be synced.
For each Cluster the following checks are done:
Date-Time Validation | Notes |
---|---|
Nodes Validation | Validates that the maximum time difference between the nodes of a cluster is less than the time required for the cluster node time request made by Eyeglass to complete. |
Eyeglass & Clusters Validation | Validates that the earliest node time for a cluster and the Eyeglass appliance time are less than the time required for the cluster node time request made by Eyeglass to complete, plus a default additional skew factor (default: 1s). Executed if Nodes Validation is OK. |
Zone and IP Pool Path Validation
Zone Path Validation provides the status of Access Zones. A status of OK indicates that the Access Zone paths have no conflicts. A status of ERROR indicates that this Access Zone collides with another Access Zone's path.
Access Zone and IP Pool FQDN Alias Validation
If the cluster was added to Eyeglass with an FQDN SmartConnect name for management, this SmartConnect Zone must have an igls-ignore hint applied to avoid a failover impacting Eyeglass access.
- A status of Error indicates that no igls-ignore hint was found on the IP pool for the SmartConnect Zone used for cluster management.
- A status of OK indicates that the igls-ignore hint was found.
Access Zone and IP Pool - DNS Dual Delegation Validation
Prerequisite: Eyeglass version 2.5.6 or later is required.
UnSupported Configurations
- If Infoblox is configured using forwarders and not dual name servers, this validation will not work. Forwarders are vendor-specific configurations and not standards-based DNS. Nslookup cannot remotely validate dual forwarding configuration. It is recommended to use name servers with Infoblox versus dual forwarders. This validation will need to be disabled if Infoblox dual forwarding is configured. Please open a case with support.
note
Eyeglass will validate the groupNet DNS servers directly and will not use the OS DNS configured on Eyeglass. This is because the DNS servers that must be configured are the ones used by PowerScale OneFS itself. This requires the Eyeglass VM to have port 53 UDP access to the groupNet DNS servers. If this is not possible, the validation must be disabled. See the relevant section for more details.
This validation will automatically validate that each SmartConnect name and alias on pools has two name servers configured, and that the IP address returned is a subnet service IP that is servicing the IP pool by a subnet with the correct SSIP. If any of these tests fail, it means failover will not be able to auto-complete the DNS step, leaving it as a manual step. This will detect misconfiguration or missing dual DNS delegation before planned failovers. If A records are used in the delegation, the DNS name returned as the name server will have a reverse lookup done to validate that the IP is a subnet service IP. This validation will also ensure the cluster pairs configured for failover have the correct SSIP on both clusters, which can be found in the inventory for the two clusters.
Example Dual DNS validation
Warnings and Information for all Possible Dual Delegation Validations
Dual delegation validation cases and corresponding information for SmartConnect zone name and SmartConnect zone alias:
-
Pool with ignore alias igls-ignore-
- Additional Information:
Pool has igls-ignore hint. Dual Delegation validation skipped.
- Additional Information:
-
Zone or pool with correct settings
- Additional Information:
IP address detected for cluster XXXX.
- Additional Information:
-
SmartConnect Zone name/alias unknown
- WARNING
- Additional Information:
Could not resolve SmartConnect Zone name to a valid IP address.
-
One NS record for SmartConnect Zone name/alias
- WARNING
- Additional Information:
DNS query returned only one IP address for this SmartConnect Zone name. There should be two IP addresses for the valid dual delegation setting.
-
Detected three NS records for SmartConnect Zone name/alias
- WARNING
- Additional Information:
Dual delegation for this zone is not set up correctly. SmartConnect Zone name resolves to more than 2 IP addresses.
-
One of the NS records is incorrect for SmartConnect Zone name/alias
- WARNING
- Additional Information:
The IP address does not reference a valid cluster.
-
There are no NS records for SmartConnect Zone name/alias
- WARNING
- Additional Information:
Could not resolve SmartConnect Zone name to a valid IP address.
-
Two NS records point to the same IP address for SmartConnect Zone name/alias
- WARNING
- Additional Information:
SmartConnect name server delegations are not dual delegated. Both names resolve to the same IP address.
-
SSIP of source cluster (in cluster) is incorrect
- WARNING
- Additional Information:
The IP address does not reference a valid cluster.
IP Pool Failover Readiness
This interface displays each Access Zone and the associated IP pool defined within the zone. Expanding each pool reveals the SyncIQ policies mapped to that pool.
-
The Access zone column shows the cluster:zone name.
-
Pool mapping will display the pool-to-pool igls-hints that map a pool on the source cluster to the target and allows viewing the mapping.
-
The Target cluster this pool will failover to is displayed.
-
The Last Successful Readiness Check shows the day and time that failover readiness assessed this pool's readiness.
-
Map policy to pool allows mapping a policy or more than one policy to a pool and allows viewing the mapping for all pools in the access zone.
-
DR Failover Status shows the highest severity state for all validations, or it will show failed over status if the pool has been failed over.
Pool Validations
The pool validations are the same as the Access Zone readiness checks, with the key difference being that they apply only to the pool itself and not the entire zone. This allows each pool to be viewed and prepared for failover independently.
Un-mapped policy validation Overview
The Pool Readiness validation unique to IP pool failover is the un-mapped policy SmartConnect/IP pool status, along with the overall pool readiness that summarizes the pool's status.
- This verifies that all SyncIQ policies in the zone have been mapped to a pool.
- A pool may have more than one SyncIQ policy mapped.
- A SyncIQ policy may NOT be mapped to more than one pool.
- Any SyncIQ policy not mapped using the DR Dashboard IP pool mapping interface will raise this error message and will block failover for all pools in the access zone until corrected.
Overall pool validation status
Network Visualization
The Network Visualization feature provides a new way to view PowerScale OneFS clusters, DR status, and jump to the DR Dashboard. This feature allows you to visualize DR and cluster replication, offering insights into data flows and storage across one or more PowerScale OneFS clusters.
This view shows which clusters are replicating with each other and the direction of replication. For each cluster, failover readiness status for all failover types is summarized.
- A red arrow indicates a failover readiness error from the source to target cluster (failover direction).
- An orange arrow indicates warnings.
- A green arrow represents active replication (failover direction) without issues, and a grey arrow represents a failed over (inactive) direction.
This simplifies monitoring multiple clusters that are replicating.
To view Network Visualization
- Access the Eyeglass menu, then click on the Network Visualization icon.
- Zoom in or out to navigate the depth of view.
- Click and hold to drag the visual view objects.
- Click a cluster to get a view of active Sync Data on the cluster viewed by Failover mode and status.
- Click on the hyperlink to jump to the DR Dashboard directly from the Network Visualization window.
Validations All Failovers, Domain Mark
Prerequisite: Eyeglass version 2.5.6 or later is required.
All Failover Type Domain Mark Validation
Documentation to correct Warnings
Consult PowerScale OneFS Documentation to run the domain mark job.
This will validate that both the source and target clusters have a valid domain mark for accelerated failback. A warning will be raised if either the source or target cluster is missing a domain mark. This is a crucial validation to ensure that failovers do not experience delays during the resync prep step while waiting for the domain mark to run.
If the validation fails, the domain mark needs to be run manually on the cluster using the PowerScale OneFS Job UI. This process should always be completed prior to any planned failover. All policies are checked during this validation.
How To Configure Advanced DNS Delegation Modes Required for Certain Environments
This feature adds the ability to detect DNS dual delegations above the SmartConnect zone name level in the DNS namespace. For example, if the SmartConnect name is data.example.com
, but the delegation is done at the example.com
level in DNS, the validation will now attempt to locate the delegation above the SmartConnect level to identify the dual delegation.
If the validation scans all DNS names and finds no dual name server entries, it will result in a failed validation.
Additional logging for this validation process can be found at: /opt/superna/sca/logs/readiness.log
.
These settings allow control over DNS query servers and recursion options needed for some environments.
When to use the options:
- If Eyeglass has no access to reach the groupNet DNS due to firewall restrictions, enable the local OS DNS option below.
- If your DNS is Bluecat, Bind, or Infoblox, we recommend disabling recursive lookups and using the option below to disable recursive lookups.
- Combine both values if both scenarios apply.
How to change the values:
-
SSH to Eyeglass as admin.
-
Run
sudo -s
(enter admin password). -
Open the system.xml file:
nano /opt/superna/sca/data/system.xml
. -
Find the
<readinessvalidation>
tag. Add the tags below with the appropriate settings for your situation. If the tags already exist, update their values to avoid duplicates:<dualdelegation_use_eyeglass_dns>true</dualdelegation_use_eyeglass_dns>
<dualdelegation_recurse>false</dualdelegation_recurse> -
Press
control+x
to save and exit. -
Run
systemctl restart sca
(for the changes to take effect). -
Done.
The tags <dualdelegation_use_eyeglass_dns>
and <dualdelegation_recurse>
may already exist under the <readinessvalidation>
tag in /opt/superna/sca/data/system.xml
. In such cases, update the existing values based on your requirements to avoid creating duplicate tags.
How the tags work:
Both tags are independent settings, which result in 4 possible states. When doing dual DNS delegation validation, the behavior is implemented as follows based on the settings:
-
If
dualdelegation_use_eygelass_dns
is true, instead of using the DNS server from the PowerScale OneFS's groupNet, get the DNS server from the local OS, and issue the dual DNS delegation validation request(s) to that DNS server instead.- If it's false, continue to issue requests to the PowerScale OneFS's DNS server on the groupNet (default mode).
-
If
dualdelegation_recurse
is false, turn off recursion on the DNS query.- If true, the DNS query will use recursive lookups, and this is the default mode.
How to Configure Advanced SPN delay mode for Active Directory Delegation
-
This advanced mode can be used to add a delay between the SPN create and delete tests used during the validation. If AD domain controllers do not execute the create and delete operations fast enough, this can fail the validation test. This will add a delay in seconds between the commands. The default is no delay.
-
SSH to Eyeglass as admin.
-
Run
sudo -s
(enter admin password). -
Open the system.xml file:
nano /opt/superna/sca/data/system.xml
. -
Example of the tag:
<readinessvalidation>
<spnCreateDeleteWait>5</spnCreateDeleteWait>
</readinessvalidation> -
Insert the readinessvalidation's tags if they don’t exists and update the value below, changing the 0 to 5 seconds.
-
Press
control + x
to save and exit the file. -
Run
systemctl restart sca
(for the changes to take effect).
How to Disable AD or DNS Delegation Validations - Advanced Option
In some cases, it may be required to disable these validations in certain environments. If Infoblox is used with forwarding, the DNS dual delegation will need to be disabled. In other cases, SPNs are not required, and AD delegation is not completed. This option is global and will disable these validations from executing on all access zones and pools.
-
Login to Eyeglass as admin using SSH.
-
Disable DNS validation:
igls adv readinessvalidation set --dualdelegation=false
-
Disable SPN test validation:
igls adv readinessvalidation set --spnsdelegation=false