Skip to content

Batch Ingestion and Prediction Tagging

Ingesting batches of production data for monitoring is similar to diagnostics ingestion, albeit with the additional requirement of including a timestamp column. This timestamp should represent the event's prediction time.

Batch Ingestion

Ingesting production data in batch using TruEra's Python SDK takes the following form using the add_production_data() method:

tru.add_production_data(
    pd,
    column_spec=ColumnSpec(
        id_col_name="id",
        pre_data_col_names=pre_data_names,
        timestamp_col_name="prediction_time",
        label_col_names=["label"]),
)

Prediction Tagging for Monitoring Segmentation

Because certain Monitoring dashboard panels must be filtered to exclusively reflect predictions having a given tag, you'll need to set up these views during dashboard creation in accordance with the following specification:

  • Prediction tags can be an arbitrary string up to 30 characters in length.
  • The maximum number of tags attached to a given prediction is 12.
  • The data type of the input column must be either string or a list of strings.

Other data types will be implicitly converted during ingestion.

Ingest these tags during prediction ingestion by specifying tags_col_name in the add_production_data() call as shown next:

tru.add_production_data(
    pd,
    data_split_name="my-prod-split",
    column_spec=ColumnSpec(
        id_col_name="id",
        pre_data_col_names=pre_data_names,
        timestamp_col_name="prediction_time",
        label_col_names=["label"],
        tags_col_name="tags")
)

Click Next below to continue.