Skip to main content
Version: 2.9.0

How to Use Recovery Manager

Introduction

Recovery Manager extends the capabilities of Data Security by automating data recovery in the event of a cyberattack. Integrated into the Event and Event History tabs, Data Security caches audit event history, allowing for recovery of data from before an attack is detected. This historical data is accessible through Recovery Manager, where you can perform targeted recoveries using file or path-based filters.

This solution provides full coverage across detection, response, and recovery, automating each step of the attack lifecycle. Recovery Manager highlights recoverable data if a snapshot was taken before the first event affecting the relevant path.

Example

  • A file /ifs/test/document.txt is renamed to /ifs/test/document.locky at 5:00 PM.
  • A snapshot was taken on the path /ifs/test/ at 4:00 PM.
    Result: The file is recoverable.
  • If the only snapshot is from 5:03 PM and no previous snapshot exists, the file is not recoverable.
  • If a snapshot exists on a different path, such as /ifs/otherpath/, but not for /ifs/test/, the file is unrecoverable.

Post-Mortem Analysis and Forensics

Recovery Manager includes an automated quarantine feature that moves affected files into a hidden location for analysis. This not only isolates harmful files from the system but also makes them invisible to end users, while still accessible for future investigation.

Prerequisites

  • Release Version: 2.5.9 or later.
  • Licenses:
    • Requires an Eyeglass DR license for inventory and snapshot management.
    • Requires an Eyeglass RWD license for ransomware event detection.
Data Retention
  • Default Activity: User activity is logged 1 hour before the event.
  • Retention Period: Data is retained for 7 days after the event. After this period, user activity data for a specific event is automatically purged.

To extend retention to 30 days:

  1. On ECA Node 1, open the configuration file:

    nano /opt/superna/eca/eca-env-common.conf
  2. Add the following line:

    export ECA_KAFKA_USER_TOPIC_RETENTION_DAYS=30
  3. Save and exit by pressing Ctrl + X.

  4. Restart the cluster:

    ecactl cluster down
    ecactl cluster up

How to Use Recovery Manager

Recovery Manager allows you to access and manage file recovery through the Actions Menu of an event in either the Active Events or the Event History tab. Follow these steps to navigate and utilize the Recovery Manager:

Old UI

  1. Access the Recovery Manager:

    • Open the Actions Menu from an active event or by navigating to the event history tab. action Recovery Manager Access in the Actions Menu
    info

    Cyber Recovery Manager Button Availability

    • The button is available for active events and for archived events with an active Kafka topic.
    • For full details, see Recovery Manager Availability section.
  2. Tree View:

    • The Tree View displays affected paths and files.
    • Use this view to browse the file system and identify which folders and files have been impacted by the attack.
  3. Filters (Optional):

    • Select a cluster and enter a specific path (e.g., /ifs/data/home).
    • Click Search on Path to search through the affected files list for the selected event, showing only files at or below the entered path.
  4. Recovered Status: Recovered Status Filter Options in Recovery Manager

    warning

    Snapshot Recovery
    To ensure effective recovery, it is essential to have a snapshot that contains the original file saved prior to any attack. While Data Security will automatically create snapshots when a ransomware event is raised, a good practice is to manually create a snapshot that saves all files needing protection.

    This is especially important for Threat Detectors (TDs) other than TD07, where ransomware events are raised only if sufficient repeated file I/O patterns are recognized (e.g., encrypting an original file to a new file, deleting the original, and renaming the encrypted file).

    During the period before Data Security raises an ransomware event, several files could already be affected, and the snapshots automatically taken at that time may not include all original files. In such cases, having an older manually created snapshot ensures that a recovery option is available.

    info

    For TD07 specifically, it looks for recognized banned file extensions, such as .locky files. This makes TD07 unique compared to other TDs, which focus on file I/O patterns.

    info

    It is possible to adjust the detection sensitivity in Threshold Settings to detect potential ransomware events faster. However, increasing sensitivity may also lead to more false positives.

    • The default filter is RECOVERABLE, which shows all files that have a valid snapshot from before the attack.
    • You can change this filter to:
      • UNRECOVERABLE: Shows files without a snapshot, making them unrecoverable.
      • RECOVERED: Shows files that have already been recovered.
  5. Recovery Statistics: Recovery Statistics Display in Recovery Manager

    • Recovery Manager tracks the status of the recovery process and displays key statistics, including:
      • Total files affected by the user identified in the event, for one hour before the event was detected. This value will not change once an event is archived.
      • Recoverable files: Based on snapshot analysis, indicating files that can be restored.
      • Unrecoverable files: Files that cannot be recovered due to missing snapshots.
      • Recovered files: Files that have already been successfully recovered.
  6. All File Activity For the user: File Details Section in Recovery Manager

    • The files section displays important information, including:

      • Path to the file.
      • Cluster name.
      • Events actions associated with the file. Click the + button to expand and view all actions and details.
      • Snapshot name to be used for recovery.
      • Date and time the snapshot was taken.
      • Recovery status: A check mark indicates successful recovery.
      info

      Within the navigation bar, other than the pagination options, there are three buttons:

      • Export Grid to CSV - exports the user activity list to a csv file

      • Refresh Cache - refreshes the cache and recalculates snapshots

      • Refresh - refreshes the user activity list

      warning

      Cache: Clearing the cache is important.

      The cache will contain inaccurate information if:

      • new snapshots are added to inventory

      • more items are added to the kafka topic

      Click the camera icon at the bottom to clear the cache.

      Eyeglass will only become aware that snapshots exist after inventory is run (such as, during Replication).

      If the Recovery Manager is invoked before inventory is run, it might not display the most accurate snapshots. Refreshing the cache will immediately recalculate snapshots based on the most recent inventory.

  7. File Selection: File Selection Options in Recovery Manager

    • You can select individual files, click Select page (to select all displayed files), or choose Select All (to select all files across all pages).
    • Use the page navigation buttons to browse through all available files before making a selection.
    info

    It is normal to see both the original file names and affected file names such as file.txt and the affected version of the file: file.txt.locky in the list. And that the number of files might be more than expected because of this.

    Recovery Manager chooses the snapshot programmatically for each file and it is possible that not all files will use the same snapshot in the same recovery job.

    Example: If a snapshot includes 10 .txt files and those files were later renamed to .locky, selecting all files and restoring will bring back the original 10 .txt files and remove the renamed .locky versions.

  8. Recovery Process: Recovery Process Progress in Recovery Manager

    • After selecting files, click the Recover button.
    • A confirmation message will appear—click Yes to proceed.
    • The Show Running Jobs view will display the progress and status of the recovery job.
  9. Recovery Job Details: Job Details of a Recovery Job in Recovery Manager

    The Recovery Job Details section provides a breakdown of the current recovery job’s status. Each job is displayed in a tree view, showing the following information:

    • State
      • The check mark (✔) indicates that the recovery or quarantine process for the file has been completed successfully.
    • Job Name
      • This column displays the specific task being executed in the recovery job.
      • For example:
        • Quarantine & Restore: Shows the process of moving files to quarantine and restoring the original file from the snapshot.
        • Moving files: Indicates the process of relocating files, such as from the path /ifs/data/demo2/virus, to the quarantine directory.
        • Restoring files: Displays the process of restoring files from the snapshot to their original paths in the production system.
  10. Cyber Recovery Life Cycle: Cyber Recovery Lifecycle Display

    • The recovery tool can be used across multiple days to manage and track recovery progress.
    • Add Recovered to the filter list to track files that have been successfully restored (indicated by a check mark).
note

The data in the cache is temporary and will be deleted after several days. Ensure that recovery efforts are completed before the files in the history cache are purged.

New UI

Alerts & Filters

  1. Access the Alerts Tab:

    • Go to the Alerts section from the sidebar. You can switch between Active and History tabs to monitor real-time or past events.
    • Active alerts display details such as Severity, User, Threat Category, and State.

    Alerts Tab in the New UI

  2. Applying Filters: Filters in Alerts Tab of the New UI

    • Use the filters to customize the alert view:
      • Date: Choose a time frame (e.g., Last 7 Days, Past Year).
      • Severity: Filter by Critical, Major, or Warning alerts.
      • User: Filter by the user associated with the event.
      • State: Choose a state from the following list:
        • Monitor
        • Warning
        • Acknowledged
        • To Lockout
        • Locked Out
        • Delayed Lockout
        • Access Restored
        • Recovered
        • Unresolved
        • False Positive
        • Self Recovery
        • Error
    • After selecting the desired filters, click Apply to display the relevant alerts.
  3. Event Details: Event Details in Alerts Tab

    • Each alert shows the following:
      • Severity: Indicates the alert level (Critical, Major, or Warning).
      • User: Displays the user who triggered the event.
      • Threat Category: Describes the type of suspicious activity (e.g., Suspicious Extension).
      • Items Affected: Shows how many items are impacted.
      • Source Cluster: Displays the affected cluster (e.g., ofs97-11).
      • State: Shows the current status of the alert (e.g., Locked Out, Recovered).
      • Detected At: The time the alert was identified.
      • Timer: Any relevant timers or countdowns related to the event.
  4. History Tab:

    • The History tab keeps a record of past events. You can apply filters similar to the Active tab, narrowing down historical alerts by users, severity, and state.
  5. Critical Alerts: Critical Alerts Display

    • Critical alerts lock the user to prevent further access until the issue is resolved. These alerts are marked with a red badge for urgency.
  6. Job Monitoring (Security Guard Toggle): Job Monitoring with Security Guard Toggle

    • Toggle Show Security Guard to view the alerts.

Viewing Active Alerts and History Alerts

When you click on an active alert or one from the history, two sections are displayed: Overview and Recovery Manager.

Overview
  1. Alert Information Display: Overview of Alert Information in the New UI

    • At the top of the screen, you’ll see important details about the alert:
      • Severity: Shows how critical the alert is (Critical, Major).
      • User: Displays the associated user.
      • State: The current status of the alert.
      • Detected At: The timestamp when the alert was first detected.
      • Source IP Address: The IP address that triggered the alert.
      • Duration: The total time the alert has been active.
  2. Alert Action Options: Alert Action Options

    • Manage the alert using the following options:
      • Create Snapshot: Capture the current system state.
      • Archive as Unresolvable: Archive the alert if it cannot be resolved.
      • Restore User Access: Re-enable the affected user’s access after resolving the issue.
  3. Alert Overview: Timeline in Alert Overview Section

    • The Overview section shows a timeline of actions taken during the alert:
      • Recovered: Displays any recovery actions taken.
      • Snapshots: Lists snapshots taken during the event.
      • Shares: Shows the shares impacted by the event.
      • User Alert History: View past alerts associated with the user.
Recovery Manager

Recovery Manager Overview Section

The Recovery Manager allows users to view and manage the recovery process for files impacted by security events. It provides an overview of affected files, recovery status, and filtering options to help navigate large datasets easily. Below is a step-by-step guide to using this feature.

Viewing Recovery Statistics

Viewing Recovery Statistics in Recovery Manager

At the top of the Recovery Manager, users can see a summary of the recovery process:

  • Total: Displays the total number of files impacted by the event.
  • Recoverable: Indicates the percentage of files that can be recovered based on available snapshots.
  • Unrecoverable: Shows the percentage of files that cannot be recovered due to missing snapshots.
  • Recovered: Tracks the percentage of files already restored.
Filtering and Selecting Files

Filtering and Selecting Files in Recovery Manager

To efficiently manage large numbers of affected files, the Recovery Manager offers powerful filtering options:

  • Cluster: Filter the list of files by selecting the cluster associated with the files.
  • Path: Narrow down the list by specifying a path to focus on files in a particular directory.
  • Recovery Status: Choose from statuses such as recoverable or unrecoverable to see the current state of the files.
  • Event Type: Filter based on actions taken on the file, such as file creation or renaming.
  • Event Time: Filter files by the time of the event to focus on a specific period.
File Tree View

File Tree View in Recovery Manager

On the left-hand side, the File Tree View provides a directory-based navigation system. Users can expand directories to explore specific paths and see which files were impacted within each folder.

Viewing File Activity

Detailed File Activity in Recovery Manager

Each file in the recovery list includes detailed information to help users make informed decisions during the recovery process:

  • Path: The file path where the affected file is located.
  • Cluster: The cluster associated with the file.
  • Event Type: The action that occurred on the file, such as file creation or renaming.
  • File Name: The name of the affected file.
  • Recovery Source: The snapshot or source being used to restore the file.
  • Recovery Time: The time required to complete the recovery process.
  • Status: The current status of the recovery process, such as whether the file is recoverable or if the recovery is in progress.
Recalculate Snapshots

Recalculate Snapshots Option in Recovery Manager

If the available snapshots for a file have changed or new data is available, users can click Recalculate Snapshots. This will refresh the snapshot data for the selected files and provide updated recovery points. It ensures that the latest snapshots are used during recovery, improving the chances of a successful restore.

Export as CSV

Export as CSV Option in Recovery Manager

Users can export the list of affected files and their recovery statuses to a CSV file for external reporting, auditing, or tracking purposes.

Recover Files

File Recovery Selection File Recovery Confirmation

After selecting the files you wish to recover, follow these steps:

  1. Select Files: Check the box next to each file you want to recover. You can select multiple files at once.

  2. Recover: Click the Recover button at the bottom of the screen. A confirmation dialog will appear, prompting you to confirm the recovery action.

  3. Confirmation: Once confirmed, the selected files will be restored from the most recent available snapshot, bringing them back to their last known good state before the event.

Recovery Manager Availability

Active Events

  • The Cyber Recovery Manager button is always displayed in the event Actions window for active events.
  • If the associated Kafka topic for an active event is deleted, the button will still remain available.

Archived Events in the History Tab

  • For archived events in the History tab, the Cyber Recovery Manager button is shown only if a Kafka topic exists. alt text
  • The button will not appear in the History tab if the Kafka topic has been deleted. alt text

Understanding Kafka Topics

A Kafka topic is a message queue where audit records for each event are temporarily stored. When a new event or alert arrives in Ransomware Defender (RWD), recent audit logs are searched to gather activity from the past hour for the event user. This information is then saved in a Kafka topic.

Kafka Topic Retention:

  • Default Retention Period: Kafka topics are retained for seven days by default. This applies to both active and archived events.
  • Configurable Retention: Adjust the retention period by setting ECA_KAFKA_USER_TOPIC_RETENTION_DAYS in the eca-env-common.conf file.

After the retention period, Kafka topics are automatically deleted. For archived events, this means the Recovery Manager button will no longer appear in the History tab of the Actions window, and the User Activity window will display only the tree view sample instead of a comprehensive list of file activity.

How to Complete Forensics of Quarantine Data

When recovering compromised files, follow these steps to perform a forensic analysis of the quarantined data:

  1. All compromised files are moved during recovery to a quarantine location in /ifs/.ransomwaredefender/corrupted/.
  2. Under this location, each user's SID is created as a subfolder, and the full relative path to each file is maintained, allowing for targeted scanning with security or decryption tools.
  3. This structure also serves as a map of the attacked data, enabling the cyber recovery team to analyze the attack pattern within the file system.
  4. The quarantine location is secured outside of any SMB or NFS exports, keeping the data hidden in a protected area.
  5. If necessary, create an SMB share with read-only access for cyber recovery teams to analyze the quarantined data.

How Recovery Manager Determines Which Version to Restore

When our software detects activity by a bad actor, it raises a ransomware event.

Once the event occurs, Recovery Manager tracks all events by the bad actor starting from 1 hour before the event.

Recovery Manager compiles a list of all files—or objects (in the case of AWS/ECS)—that the bad actor modified. Modifications include:

  • Writing to files or objects,
  • Creating new ones,
  • Deleting existing ones,
  • Renaming them.

For each file or object, Recovery Manager refers to the snapshot that was created before the event and selects the version of the file from this snapshot for recovery. This version comes from before the first modification by the bad actor, within the one hour before the ransomware event.

PowerScale Case: Finding the Correct Snapshot for Recovery

To understand how Recovery Manager determines the best snapshot for recovery, the following criteria are used:

  1. Snapshot Location:

    • The snapshot must exist on the same path as the file.
  2. Snapshot Timing:

    • The timestamp of the snapshot must be earlier than the timestamp of the first event initiated by the bad actor on that file, as tracked by Recovery Manager.
  3. Best Snapshot Selection:

    • Among all snapshots that meet the above criteria, the best snapshot is the one that is closest in time to the first event on the file. This ensures the most recent and relevant version of the file is recovered.

Snapshot Recovery Example

Let’s consider the case of a file called document.txt located at /ifs/test.

  • A bad actor renames the file to document.locky at 5:00 PM.

Recoverable Scenario

  • A snapshot was taken on the path /ifs/test/ before the file was renamed, at 4:00 PM.
    • In this case, the original document.txt file is recoverable because the snapshot predates the malicious change.

Not Recoverable Scenario 1

  • A snapshot was taken after the file was renamed, at 5:03 PM, and no prior snapshot exists on the system that covers the /ifs/test path.
    • In this case, the document.txt file is not recoverable because no valid snapshot exists from before the rename.

Not Recoverable Scenario 2

  • There is only one snapshot on the system for a different path, /ifs/otherpath/, and no snapshot covers /ifs/test.
    • The document.txt file is not recoverable because no relevant snapshot exists for its directory.
note

If a previous ransomware event corrupted the file, and the administrator restored it to a previous version after acknowledging the event and archiving it, there is a chance the most recent snapshot may still contain the corrupted version if another ransomware event is raised within the hour.

AWS/ECS Case: Recovering an Object

To recover an object, bucket versioning must be enabled. Recovery Manager will restore the most recent version from before the first modification tracked by the bad actor.

info

To display the “Earliest Recoverable Version Time” column in Recovery Manager, the Eyeglass /opt/superna/sca/data/system.xml file must contain the versioningEnabled property (inside the <process> tag) set to "true".

  • If versioningEnabled is set to false, the Bucket Versioning Status column will be displayed:

alt text

Bucket VersioningBucket Versioning Status
OFFDisabled
ONEnabled
SUSPENDED (was updated from ON to OFF)Suspended
  • If versioningEnabled is set to true, the Earliest Recoverable Version Time column will be displayed:

earliest_recoverable

alt text

Bucket VersioningBucket Versioning Status
OFFNO_VERSIONS_EXIST
ON<date_time_stamp>
SUSPENDED (was updated from ON to OFF)NO_VERSIONS_EXIST

For example:

  • A version of Object A exists in a bucket that has Versioning enabled.

  • A bad actor triggers a ransomware event.

  • Recovery Manager has access to the version of the file that existed before the ransomware event, therefore it can be restored.

If no such version exists, the object will be quarantined.

note

If a ransomware event is raised for a user and then the same user triggers another ransomware event, there is a small chance that the corrupted version might be chosen for restoration.

This may happen if:

  • The second ransomware event occurred within 1 hour of the previous event, and the file was corrupted 1 hour before the second event, and the bad actor modified the file within 1 hour before the second event.

    OR

  • The second ransomware event occurred 1 hour after the first event, and the object was restored within 1 hour before the second event, and the bad actor wrote to the object again before it was restored.

info

Recovery can mean many things:

  • restored (if the file is not a bad extension, it is placed back in the share or bucket)

  • deleted (if the file is a bad extension, it is not placed back in the share or bucket)

For example, locky files will not be restored.

ECS Differences

alt text

info
  • Bucket users must be added to Inventory, to see their activity in the CRM.

  • ECS currently does not have a view in the new Cyber Recovery Manager UI.

This section highlights the key differences between ECS and other platforms when using Cyber Recovery Manager. While the UI for Cyber Recovery Manager in ECS closely mirrors the PowerScale OneFS interface, there are a few important distinctions.

  • In ECS, instead of displaying cluster information, the interface shows bucket and region details, along with versioning information for the bucket.
  • The most significant difference between ECS and other platforms lies in the system.xml parameter <versioningEnabled>. This parameter can be set to either true or false:
    • When set to true, the Earliest Recoverable Version Time is displayed for the bucket, but UI loading times may be significantly slower.
    • When set to false, the Bucket Versioning Status is displayed instead.

Basic Setup

  1. Enable Versioning
    In S3 Browser, enable versioning on the bucket you are testing. alt text alt text

  2. View File Versions
    Ensure you can view file versions to confirm versioning is correctly set up. alt text

  3. Eyeglass Configuration
    In Eyeglass, make sure your user is added to the node in Inventory View. alt text alt text

  4. system.xml versioningEnabled Property
    This property, located within the <process> tag in /opt/superna/sca/data/system.xml, interacts with the bucket versioning status and influences the Recovery Manager UI.

tip

For more details, refer to the documentation at AWS/ECS Case: Recovering an Object.

Sorting and Filtering

  • Similar to PowerScale OneFS, Recovery Items in ECS are sorted based on the timestamp of the first event (the oldest) associated with a Recovery Item.
  • Recovery Items can also be filtered by bucket, path, and status.

Status Options

  • When <versioningEnabled> is true, the status options include:

    • RECOVERED
    • RECOVERABLE
    • NO_RECOVERABLE_VERSION
    • NO_VERSIONS_EXIST
    • UNKNOWN
  • When <versioningEnabled> is false, the status options are:

    • RECOVERED
    • UNRECOVERED

Submitting a Recovery Job

When submitting a recovery job in ECS, it's important to note the following:

info

ECS does not have a RENAME_OBJECT event. Instead, it combines two events: DELETE_OBJECT and COPY_OBJECT. This means that the system deletes the old file and creates (copies) a new file with the updated name.

To ensure complete recovery, we recommend selecting both of these events when submitting a recovery job.

Example 2

  • Original file: present1.locky
  • Renamed to: present2.locky

In this case, the DELETE_OBJECT event removes present1.locky, and the COPY_OBJECT event creates present2.locky. By selecting both events, you ensure that the renamed file is included in the recovery process.

See Also