Skip to content

Migrating to add_data() from add_data_split()

Users currently implementing the Python SDK's add_data_split() method should follow the guidance presented here to migate over to the add_data() method for faster ingestion at scale.

Python SDK Update

The TrueraWorkspace.add_data_split() method is deprecated in favor of the TrueraWorkspace.add_data() method as TruEra's recommended data ingestion method.

Shown in the snippets below, the most obvious difference is that the add_data_split() method has multiple arguments for data (pre_data, label_data, etc.), while the add_data() method consolidates these arguments into one data argument using a ColumnSpec object to specify the kind of column data being ingested.

# add_data_split (DEPRECATED)
tru.add_data_split(
    ...,
    pre_data=data_df[pre_data_col_names],
    label_data=data_df[label_col_names],
    extra_data=data_df[extra_data_col_names]
)

# add_data (RECOMMENDED)
tru.add_data(
    ...,
    data=data_df,
    column_spec=ColumnSpec(
        pre_data_col_names=pre_data_col_names,
        label_col_names=label_col_names,
        extra_data=extra_data_col_names
    )
)

To merge dataframes and obtain a ColumnSpec object, use the merge_data_frames_and_create_column_spec method.

from truera.client.ingestion.util import merge_dataframes_and_create_column_spec

# add_data_split
tru.add_data_split(
    data_split_name="my_data_split",
    pre_data=pre_df
    post_data=post_df,
    label_data=label_df,
    prediction_data=prediction_df,
    feature_influence_data=feature_influence_df,
    extra_data_df=extra_df,
    id_col_name="my_id",
    timestamp_col_name="my_timestamp"
)

# equivalent add_data call
data_df, column_spec = merge_dataframes_and_create_column_spec(
    id_col_name="my_id",
    timestamp_col_name="my_timestamp",
    pre_data=pre_df,
    post_data=post_df,
    labels=label_df,
    predictions=prediction_df,
    feature_influences=feature_influence,
    extra_data=extra_data
)
tru.add_data(
    data_split_name="my_data_split",
    data=data_df,
    column_spec=column_spec
)

Click Next below to continue.