Reading Local Files¶
TruEra supports ingesting records from local CSV and Parquet files.
Data Ingestion Basics¶
TruEra's file ingestion functionality relies on a user registering their file as a Table data source. Once, registered tables can be ingested using add_data()
and add_production_data()
like a dataframe.
The steps to ingest local files into TruEra are:
- Add the data source containing the Table object (see
add_data_source()
). - Add the data in the table in order to create a data split or ingest to a production data stream (see
add_data()
andadd_production_data()
).
import pandas as pd
# Save pandas DataFrame as local CSV file
pd.DataFrame({
"id": [1, 2, 3],
"feature1": [0, 0, 1],
"feature2": [1, 1, 0]
}).to_csv("data.csv")
# Add data source
data_source = tru.add_data_source("data.csv")
# Create split from data source
from truera.client.ingestion import ColumnSpec
tru.add_data(
data_source,
data_split_name="split_1",
column_spec=ColumnSpec(
id_col_name="id",
pre_data_col_names=["feature1", "feature2"]
)
)
Click Next below to continue.