Reading Local Files¶
TruEra supports ingesting records from local CSV and Parquet files.
Data Ingestion Basics¶
TruEra's file ingestion functionality relies on a user registering their file as a Table data source. Once, registered tables can be ingested using add_data() and add_production_data() like a dataframe.
The steps to ingest local files into TruEra are:
- Add the data source containing the Table object (see
add_data_source()). - Add the data in the table in order to create a data split or ingest to a production data stream (see
add_data()andadd_production_data()).
import pandas as pd
# Save pandas DataFrame as local CSV file
pd.DataFrame({
"id": [1, 2, 3],
"feature1": [0, 0, 1],
"feature2": [1, 1, 0]
}).to_csv("data.csv")
# Add data source
data_source = tru.add_data_source("data.csv")
# Create split from data source
from truera.client.ingestion import ColumnSpec
tru.add_data(
data_source,
data_split_name="split_1",
column_spec=ColumnSpec(
id_col_name="id",
pre_data_col_names=["feature1", "feature2"]
)
)
Click Next below to continue.