Skip to content

Basic Ingestion Methods

Similar to data ingestion, you can use the Python SDK for model ingestion/import with multiple paths depending on whether your model is already packaged and how. If you're a new user, you may want to do a quick review of the general project structure supported by TruEra.

Otherwise, this topic on basic methods includes starter guidance on adding a model, computing and adding predictions, as well as computing and adding feature and error influences.

First things first

Before ingesting a model, you'll first need to ingest a
data collection

Data Collection

An organized inventory of data consisting of individual data splits used for a particular model.

Data Split

One of two or more subsets of the data collection. Typically, with a two-part split, one part is used to evaluate or test the data, while the other is used to train the model.
structure. See Data Ingestion for guidance.

TruEra's Python SDK supports the ingestion of
packaged models

Packaged Model

Comprises an executable Python model object, along with a collection of modules arranged in a hierarchy of folders that includes an file.
, as well as models without an executable Python object attached. If no executable object is available, model predictions and influences must be computed externally before being added.

Most Commonly Used Methods

As briefly introduced in Quickstart for Diagnostics, the most commonly used model ingestion method leverages add_paython_model().

from sklearn.ensemble import GradientBoostingRegressor

# instantiate and fit model
gb_model = GradientBoostingRegressor(), YS)

# Add to TruEra workspace.
tru.add_python_model("model 1", gb_model)
However, if yours is an

Natural Language Processing

combines computational linguistics—rule-based modeling of human language—with statistical, machine learning, and deep learning models.
model, only virtual model ingestion is supported.

# NLP only supports virtual models currently
For more on NLP, see NLP Diagnostics.

Additional methods are discussed next for:

Packaged Models

Use TrueraWorkspace.add_packaged_python_model() to ingest a packaged model. This method registers and adds the new model, including the executable model object provided, deducing the model framework to appropriately serialize and upload the model object to the TruEra server.

Models of supported frameworks — scikit-learn, XGBoost, LightGBM, CatBoost and PySPark (tree models only) — can be passed directly.

If you are unable to ingest your model using this method due to custom logic, feature transforms, et al, consider using create_packaged_python_model().

Models Without an Executable Object

Models without an executable object can be ingested by TruEra using the SDK's add_model() method. However, without the executable object, model predictions and influences must be computed externally, then added.

Add your externally computed predictions for a given split with add_model_predictions().

Add externally computed feature influences with add_model_feature_influences().

Add externally computed error influences with add_model_error_influences().

For more on adding feature and error influences, click here.

See Tutorial: Adding a Model for a notebook tutorial on virtual model ingestion.

Ingesting Feature Transformations

Using QII for influences, you can wrap any set of model transformations and provide influences with respect to pre-transformed (human-readable) features. However, this must be done locally and then added.

Complex transformations like dimensionality reductions (e.g.,

Principal Component Analysis

Unsupervised learning technique for reducing the dimensionality of data; increases interpretability while minimizing information loss.
) can be packaged as a part of the model object itself. However, there are optimizations that can be enabled for simpler one-to-many feature transformations that map a single pre-transform feature to a unique set of post-transform features. Examples of such feature transformations include
Aligns data values to a common scale or distribution of values.
Measures a value's relationship to the mean of a group of values in terms of standard deviations from the mean.
, mean corrections),
one/multi-hot encodings

one-hot encoding

Represents categorical variables as numerical values

multi-hot encoding

Binary encoding of multiple tokens in a single vector
Method for retaining the majority of the dataset's data and information by substituting missing data with a different value.
, and beyond.

To enable these optimizations, post-ingestion, you have two options (the first is recommended):

  1. Python models – if the transformation from raw human-readable data to model-readable data can be expressed as a function, add this function to the packaged model wrapper as an additional transform function. Details can be found in Custom Data Transformation.

  2. Java models – if the transformation cannot be simply expressed as a function, you can instead capture data before and after the transformation and ingest them as pre-transform data and post-transform data.

In both cases, add feature mapping from the columns of pre-transformed data to the post-transformed data. This can be done with the Python SDK during data collection creation.

Post-ingestion Linear and Tree Model Optimizations

TruEra optimizations for tree-based and scikit-learn sklearn.linear_mode are enabled by ingesting the model object directly using the Python SDK. Or, you can add a get_models() function to your packaged model wrapper (see Model Packaging and Execution).

Click Next below to continue.