Skip to content

Data Ingestion

Ingesting data means importing large data files from multiple sources into a single, cloud-based storage medium — a data warehouse, data mart, or database — from which it can be accessed and analyzed by TruEra.

First things first

Remember to first create a project before attempting to ingest data. Review the Quickstart for an brief overview of the ingestion process using sample data.

To ingest project data, TruEra supports the following methods:

In terms of task breakout, these comprise, at minimum:

  • Pre-deployment
    • Feature Development – transforming raw data into features that better represent the underlying problem, resulting in improved model accuracy on unseen data.
    • Model Training – fitting the best combination of weights and bias to minimize loss functions over the prediction range.
  • Post-deployment
    • Logging Inputs and Predictions – classifying whether a particular log event, or set of events, is causing a real incident that requires attention.
    • Logging Additional Metrics – score tracking to determine real accuracy and improvement —
      F1
      Measures a model's accuracy by combining its precision and recall scores; computes how many times a model made a correct prediction across the entire dataset.
      ,
      F2
      Weighted harmonic mean of the precision and recall (given a threshold value). Unlike the F1 score, which gives equal weight to precision and recall, the F2 score gives more weight to recall than to precision
      ,
      brier_loss
      Measures the mean squared difference between the predicted probability and the actual outcome.
      ; iteration-level metrics (learning curves); predictions after every epoch; and updated experiment metrics among many others.

Above all, have a plan for tracking and handling your model's results, both expected results and unexpected, so you can refine and improve your data and model all along your path to ultimate success.

Click Next below to continue.