Ingesting Production Data¶
Production data is ingested into TruEra in batches or as a stream of records, depending on whether your model makes batch inferences or real-time inferences.
If your model makes batch inferences on a periodic basis, we recommend adding a step to your batch inference pipeline to run add_production_data()
on the produced dataframe or object storage file.
If your model makes real-time inferences, we recommend using TruEra's ingest_events()
function to ingest the inputs and outputs of your model as it makes inferences.
If you have a single model real-time endpoint deployed to Amazon SageMaker with data capture enabled, TruEra can read captured model inputs and outputs on a scheduled basis using its Data Capture integration. This integration reads new files in your Data Capture S3 prefix on a scheduled basis, parses them, and pulls the data into your model's production data stream without any custom integration work in your serving application.
Data for all types of production data ingestion — batch push or scheduled pull or streaming push — come in essentially two flavors:
-
Structured data or tabular data — data in a database, commonly known for being highly organized so that it can be easily searched, changed, and analyzed.
-
Unstructured data — typically rich media like long-form text, audio and video accounts for 80% of data in enterprises and is often difficult to manage, store, and analyze because it doesn’t have a predefined format or structure, barring the capability to automatically organize information.
Essentially, TruEra monitoring dashboards can handle models of any type focused on model output, labels, custom metrics, and segment tags, although tracking data drift and data quality for tabular inputs is currently supported.
Click a link below to explore your options and determine what's suitable for your particular use case.
Or click Next below to continue.