Skip to main content
Version: 2.12.0 🚧

Bulk Ingest Operations

Introduction​

This guide provides instructions for re-ingesting historical audit data from PowerScale’s audit directory into Easy Auditor’s index, using the Bulk Ingest feature. Bulk Ingest ensures that any unprocessed, compressed logs are captured and indexed, allowing Easy Auditor to maintain a complete record of user activity.

Requirements and Limitations​

  • See Configuration!

How to Bulk Ingest Data​

Steps to Bulk Ingest Data​

  1. Open Easy Auditor Module

    From the Eyeglass desktop or main menu, launch the Easy Auditor module.

  2. Access Bulk Ingest

    In the left sidebar, under Active Auditing, select the Bulk Ingest tab.

  3. Select the Cluster

    Choose the desired cluster to begin the audit data ingestion. Ensure the cluster selected aligns with the data you intend to process.

  4. Set Date Range for Audit Data Search

    • Locate the Start Time field and specify the most recent date from which you want to initiate the search.
    • From the Search Previous dropdown, select a maximum duration of 3 Days to fetch archived audit logs.
    • Click Submit Settings to apply the date filter on the chosen cluster.
  5. Select Files for Ingestion

    • After filtering, the File Selector panel on the right will display available files based on the specified date range.
    • Use the checkboxes to select each file you want to include in the ingestion job.
  6. Submit Selected Files

    Once you have chosen the files, click the Submit Files button.

  7. Run the Ingestion Job

    Initiate the job by selecting the Run button in the lower right corner of the interface. The job will commence and begin processing the selected files.

  8. Queue Additional Jobs (Optional)

    You may configure a new job and select additional files for ingestion without waiting for the current job to complete. While only one job runs concurrently (due to reserved ingestion resources), subsequent jobs will queue automatically and start once the active job finishes.

View Progress of Bulk Data Ingestion​

note

Each PowerScale node contains historical audit data, and each file is approximately 1GB in size when compressed. Each node may have multiple files to ingest for a given day. Due to this volume, ingestion can be a slow process, particularly when processing multiple days of historical data.

  1. Start the Queue Monitor Process

    Login to node 1 of the ECA cluster and execute the following command to start the queue monitoring:

    ecactl containers up -d kafkahq
  2. View Ingestion Jobs

    Open the Jobs icon on the Eyeglass interface and select the Running Jobs tab to view active ingestion tasks.

    Key points to monitor:

    • Running Jobs: This screen shows the active ingestion tasks. Wait for a job to finish before submitting new files for ingestion.
    • Spark Job Status: Ensure the Wait for Spark Job step displays a blue checkmark, indicating the job has completed. A spinning symbol next to the job means it is still in progress.
  3. Monitor Event Ingestion Progress

    • Access KafkaHQ:
      Using a browser, navigate to http://<node-1-IP>/kafkahq (replace <node-1-IP> with the IP address of node 1 in the ECA cluster). Login using the ecaadmin username and the default password 3y3gl4ss.

    • Kafka Topics Overview:

      • Topic Tracking: Locate the topic named "bulkingestion" to monitor the progress of the ingestion task.
      • Lag: This value will vary depending on ingestion speed:
        • A lag of 0 indicates the current ingestion job has finished, with no additional files being processed.
      • Count: This field will increase as new events are processed across all jobs. As new JSON files are added to the ingestion queue, this value will reflect the total events being ingested.
  4. View in-progress files

    Open the Easy Auditor module from the Eyeglass desktop or main menu. In the left sidebar, under Report, select Finished Reports. Here, you can view the history of all current and previous bulk ingestion jobs. By selecting View on a report, you can see the status (Not Started, Running, Success) of individual files within that bulk ingestion job.