Migrating to add_data()
from add_data_split()
¶
Users currently implementing the Python SDK's add_data_split()
method should follow the guidance presented here to migate over to the add_data()
method for faster ingestion at scale.
Python SDK Update
The TrueraWorkspace.add_data_split()
method is deprecated in favor of the TrueraWorkspace.add_data()
method as TruEra's recommended data ingestion method.
Shown in the snippets below, the most obvious difference is that the add_data_split()
method has multiple arguments for data (pre_data
, label_data
, etc.), while the add_data()
method consolidates these arguments into one data
argument using a ColumnSpec
object to specify the kind of column data being ingested.
# add_data_split (DEPRECATED)
tru.add_data_split(
...,
pre_data=data_df[pre_data_col_names],
label_data=data_df[label_col_names],
extra_data=data_df[extra_data_col_names]
)
# add_data (RECOMMENDED)
tru.add_data(
...,
data=data_df,
column_spec=ColumnSpec(
pre_data_col_names=pre_data_col_names,
label_col_names=label_col_names,
extra_data=extra_data_col_names
)
)
To merge dataframes and obtain a ColumnSpec
object, use the merge_data_frames_and_create_column_spec
method.
from truera.client.ingestion.util import merge_dataframes_and_create_column_spec
# add_data_split
tru.add_data_split(
data_split_name="my_data_split",
pre_data=pre_df
post_data=post_df,
label_data=label_df,
prediction_data=prediction_df,
feature_influence_data=feature_influence_df,
extra_data_df=extra_df,
id_col_name="my_id",
timestamp_col_name="my_timestamp"
)
# equivalent add_data call
data_df, column_spec = merge_dataframes_and_create_column_spec(
id_col_name="my_id",
timestamp_col_name="my_timestamp",
pre_data=pre_df,
post_data=post_df,
labels=label_df,
predictions=prediction_df,
feature_influences=feature_influence,
extra_data=extra_data
)
tru.add_data(
data_split_name="my_data_split",
data=data_df,
column_spec=column_spec
)
Click Next below to continue.