Skip to main content
Version: 2.9.0

How to Bulk Ingest Auditing Data

Introduction

This guide provides instructions for re-ingesting historical audit data from PowerScale’s audit directory into Easy Auditor’s index, using the Bulk Ingest feature. Bulk Ingest ensures that any unprocessed, compressed logs are captured and indexed, allowing Easy Auditor to maintain a complete record of user activity.

Requirements and Limitations

Requirements

  • Targeted Date Event: The system supports data ingestion only for a specific, targeted date of an event. It does not support ingesting large time frames like days, weeks, months, or years of data. Users can only select files for submission within a window of the previous 3 days.
  • Limited File Handling: The maximum number of .gz files that can be ingested per job is 20. For initial testing, it is recommended to use only one file.
  • Job Scheduling: A cron job should be scheduled to run during non-peak hours to avoid performance issues, as the system prioritizes active audit data.
  • Job Queueing: When submitting more than one ingestion job, they will be queued, and only one will execute at a time.

Limitations

  • No Bulk Support: Bulk ingestion is a background task, and it is not supported under the standard support contract. The system prioritizes processing of current audit data over bulk ingestion, meaning there is no way to predict how long bulk ingestion will take.
  • Performance Impact: The system has no method to predict the time it will take to ingest audit data, as processing of active audit data takes precedence over bulk ingestion tasks.
  • Concurrency Limitation: The system supports only 1 concurrent running job, with a queue for any additional jobs.
  • Non-changeable Priority: There is no option to change the priority of bulk ingestion jobs, meaning they will always be lower priority than active audit data processing.

How to Bulk Ingest Data

Set Up NFS Ingestion

Before using the Bulk Ingest feature, you must set up NFS ingestion on ECA clusters.

  1. Create a directory for Bulk Ingestion in the ECA Cluster

    On ECA node 1, run the following command to create a directory for Bulk Ingestion.

    ecactl cluster exec "sudo mkdir -p /opt/superna/mnt/bulkingestion/<cluster-guid>/<cluster-name>

    Replace <cluster-guid> with the unique identifier for your Isilon cluster and <cluster-name> with your cluster's name.

    If the directory already exists, you can skip this step.

  2. Configure Bulk Ingestion Path: Use the CLI to configure the non-default path for bulk ingestion by running the following command:

    igls config settings set --tag-bulkingestpath --value=<PATH>
    • Replace <PATH> with the full path on the PowerScale system where .gz files will be saved.
    • Make sure .gz files are stored in this specific directory in PowerScale.

    To tell Easy Auditor where to look for the audit logs when using a non-default path, use the Eyeglass CLI to configure the path:

    igls config settings set --tag=bulkingestpath --value=<PATH>

    Replace <PATH> with the full directory path on PowerScale where .gz audit log files will be saved.

    note

    When using a non-default path, follow this required directory structure: /PATH/node-name/protocol

    Each node must have its own subdirectory within <PATH>, named after the node and containing a protocol folder. Store all .gz files within the protocol folder of each node’s directory.

  3. Create an NFS Export in PowerScale

    If an export already exists on the path, then no further action is required.

    For a non-default path, create an NFS export in PowerScale using this command:

    isi nfs exports create <PATH> --root-clients="<eca-ips>" --clients="<eca-ips>" --read-only=true -f --description "Bulk ingest export" --all-dirs true
    • Replace <PATH> with the custom path configured in Step 2.
    • Add the IP addresses of all ECA nodes to the export's client list to grant access.
  4. Edit the Auto NFS Configuration:

    On each ECA node, edit the file:

    /opt/superna/eca/data/audit-nfs/auto.nfs

    Add the following line to configure the NFS mount:

    /opt/superna/mnt/bulkingestion/<cluster-guid>/<cluster-name> --fstype=nfs,nfsvers=4,ro,soft <FQDN>:<PATH>

    Replace <cluster-guid>, <cluster-name>, <FQDN>, and <PATH> with the values specific to your configuration.

  5. Mount the export on ECA Nodes:

    Run the following command on each ECA node to mount the NFS export:

    sudo /opt/superna/eca/scripts/manual-mount.sh

    Alternatively, you can run a cluster shutdown or startup command to trigger automatic mounting.

Steps to Bulk Ingest Data

  1. Open Easy Auditor Module

    From the Eyeglass desktop or main menu, launch the Easy Auditor module.

  2. Access Bulk Ingest

    In the left sidebar, under Active Auditing, select the Bulk Ingest tab.

  3. Select the Cluster

    Choose the desired cluster to begin the audit data ingestion. Ensure the cluster selected aligns with the data you intend to process.

  4. Set Date Range for Audit Data Search

    • Locate the Start Time field and specify the most recent date from which you want to initiate the search.
    • From the Search Previous dropdown, select a maximum duration of 3 Days to fetch archived audit logs.
    • Click Submit Settings to apply the date filter on the chosen cluster.
  5. Select Files for Ingestion

    • After filtering, the File Selector panel on the right will display available files based on the specified date range.
    • Use the checkboxes to select each file you want to include in the ingestion job.
  6. Submit Selected Files

    Once you have chosen the files, click the Submit Files button.

  7. Run the Ingestion Job

    Initiate the job by selecting the Run button in the lower right corner of the interface. The job will commence and begin processing the selected files.

  8. Queue Additional Jobs (Optional)

    You may configure a new job and select additional files for ingestion without waiting for the current job to complete. While only one job runs concurrently (due to reserved ingestion resources), subsequent jobs will queue automatically and start once the active job finishes.

View Progress of Bulk Data Ingestion

note

Each PowerScale node contains historical audit data, and each file is approximately 1GB in size when compressed. Each node may have multiple files to ingest for a given day. Due to this volume, ingestion can be a slow process, particularly when processing multiple days of historical data.

  1. Start the Queue Monitor Process

    Login to node 1 of the ECA cluster and execute the following command to start the queue monitoring:

    ecactl containers up -d kafkahq
  2. View Ingestion Jobs

    Open the Jobs icon on the Eyeglass interface and select the Running Jobs tab to view active ingestion tasks.

    Key points to monitor:

    • Running Jobs: This screen shows the active ingestion tasks. Wait for a job to finish before submitting new files for ingestion.
    • Spark Job Status: Ensure the Wait for Spark Job step displays a blue checkmark, indicating the job has completed. A spinning symbol next to the job means it is still in progress.
  3. Monitor Event Ingestion Progress

    • Access KafkaHQ:
      Using a browser, navigate to http://<node-1-IP>/kafkahq (replace <node-1-IP> with the IP address of node 1 in the ECA cluster). Login using the ecaadmin username and the default password 3y3gl4ss.

    • Kafka Topics Overview:

      • Topic Tracking: Locate the topic named "bulkingestion" to monitor the progress of the ingestion task.
      • Lag: This value will vary depending on ingestion speed:
        • A lag of 0 indicates the current ingestion job has finished, with no additional files being processed.
      • Count: This field will increase as new events are processed across all jobs. As new JSON files are added to the ingestion queue, this value will reflect the total events being ingested.
  4. View in-progress files

    Open the Easy Auditor module from the Eyeglass desktop or main menu. In the left sidebar, under Report, select Finished Reports. Here, you can view the history of all current and previous bulk ingestion jobs. By selecting View on a report, you can see the status (Not Started, Running, Success) of individual files within that bulk ingestion job.

See Also

For instructions on defining and submitting audit queries, see the Submit Audit Queries document.