Adding Data and Models¶
Add data for your model across it's lifecycle from development to production using the TruEra SDK.¶
TruEra stores model data as records in a data collection. A data collection has a schema that all records must conform to. The fields in a data collection belong to one of the following data kinds:
- Features: independent variables (data) that are input into a model to generate predictions.
- Labels: the dependent variable used to train a model and/or to evaluate the performance of a model in production.
- Predictions: The scores that a model generates, or outputs, from feature inputs. These are continuous or discrete values depending on the modeling use case.
- Non-input data: Extra categorical or nominal data with which to further analyze model behavior. TruEra uses non-input "extra" data to power our segmentation and fairness features. TruEra Monitoring services can use extra data for tagging/segmentation purposes and/or to track non-model specific, but associated, business KPIS and metrics in parallel to model inputs and outputs.
- Influences: Feature influences represent the marginal contributions of specific feature values to a model output (i.e., prediction/score). See our supported metrics section for a deep dive on feature influences. Feature influences can be generated by TruEra, using an ingested model artifact and input data (model features). Alternatively, these can be generated independently of the TruEra ingestion process and ingested into a TruEra project.
All records also include a unique ID.
Data kinds for a record can be ingested separately.
Data across your model's lifecycle¶
Pre-production data¶
Add your model's pre-production data to TruEra to evalulate model training experiments in TruEra Diagnostics. TruEra also uses added pre-production data as a comparison baseline against your model's production data to track drift and data quality metrics over time in TruEra Monitoring. Pre-production datasets in TruEra are finite and bounded.
During model development, each data kind for a record can be ingested separately. Generally, customers ingest records in two steps. First, they ingest a "model agnostic" record with features and labels together with a unique record ID before model training. Next, after model training, customers can "complete" the records for each trained model by adding the "model specific" data kinds(feature influences and predictions) using the same record ID to associate the data kinds together.
Production data¶
Add your model's production data to track its behavior over time in TruEra Monitoring. Production datasets in TruEra are unbounded time series that your model produces after it is deployed. TruEra supports ingesting data produced by batch inference and real-time inference models.
In production, features, predictions, non-input "extra" data data kinds are ingested together.
The feature influences and labels data kinds, can be ingested later and separately, as they are not known at inference time.
TruEra's Monitoring service periodically joins ingested labels and predictions and compute performance metrics to track a model's performance over time.
Two Modes to Add Data: Data and Model vs. Data Only¶
TruEra supports two data ingestion models for pre-production data and production data.
Data and Model¶
In this mode, the user sends TruEra their model artifact. This enables the TruEra SDK to compute and ingest feature influences, which are useful in understanding model behavior. During model development, the computed feature influences are used for model evaluation and debugging. Once a model is in production, any computed feature influences can be used to debug degradataions in model quality.
Data Only¶
In this mode, the user does not send TruEra their model artifact and use it as feature influence computation engine. Users opt to send data only if they don't have access to their model artifiact or have ane existing external method to generate feature influences.
The following sections elaborate on adding data and model artifacts to TruEra.
Click Next below to continue.