Basic Ingestion Methods¶

Similar to data ingestion, you can use the Python SDK for model ingestion/import with multiple paths depending on whether your model is already packaged and how. If you're a new user, you may want to do a quick review of the general project structure supported by TruEra.

Otherwise, this topic on basic methods includes starter guidance on adding a model, computing and adding predictions, as well as computing and adding feature and error influences.

First things first

Before ingesting a model, you'll first need to ingest a and structure. See Data Ingestion for guidance.

TruEra's Python SDK supports the ingestion of , as well as models without an executable Python object attached. If no executable object is available, model predictions and influences must be computed externally before being added.

Most Commonly Used Methods¶

As briefly introduced in Quickstart for Diagnostics, the most commonly used model ingestion method leverages add_paython_model().

from sklearn.ensemble import GradientBoostingRegressor

# instantiate and fit model
gb_model = GradientBoostingRegressor()
gb_model.fit(XS, YS)

# Add to TruEra workspace.
tru.add_python_model("model 1", gb_model)

However, if yours is an model, only virtual model ingestion is supported.

# NLP only supports virtual models currently
tru.add_model("model")

For more on NLP, see NLP Diagnostics.

Additional methods are discussed next for:

Packaged Models
Unpackaged Models (models lacking an executable object)
Models requiring feature transforms

Packaged Models¶

Use TrueraWorkspace.add_packaged_python_model() to ingest a packaged model. This method registers and adds the new model, including the executable model object provided, deducing the model framework to appropriately serialize and upload the model object to the TruEra server.

Models of supported frameworks — scikit-learn, XGBoost, LightGBM, CatBoost and PySPark (tree models only) — can be passed directly.

If you are unable to ingest your model using this method due to custom logic, feature transforms, et al, consider using create_packaged_python_model().

Models Without an Executable Object¶

Models without an executable object can be ingested by TruEra using the SDK's add_model() method. However, without the executable object, model predictions and influences must be computed externally, then added.

Add your externally computed predictions for a given split with add_model_predictions().

Add externally computed feature influences with add_model_feature_influences().

Add externally computed error influences with add_model_error_influences().

For more on adding feature and error influences, click here.

See Tutorial: Adding a Model for a notebook tutorial on virtual model ingestion.

Ingesting Feature Transformations¶

Using QII for influences, you can wrap any set of model transformations and provide influences with respect to pre-transformed (human-readable) features. However, this must be done locally and then added.

Complex transformations like dimensionality reductions (e.g., ) can be packaged as a part of the model object itself. However, there are optimizations that can be enabled for simpler one-to-many feature transformations that map a single pre-transform feature to a unique set of post-transform features. Examples of such feature transformations include (, mean corrections), , , and beyond.

To enable these optimizations, post-ingestion, you have two options (the first is recommended):

Python models – if the transformation from raw human-readable data to model-readable data can be expressed as a function, add this function to the packaged model wrapper as an additional transform function. Details can be found in Custom Data Transformation.
Java models – if the transformation cannot be simply expressed as a function, you can instead capture data before and after the transformation and ingest them as pre-transform data and post-transform data.

In both cases, add feature mapping from the columns of pre-transformed data to the post-transformed data. This can be done with the Python SDK during data collection creation.

Post-ingestion Linear and Tree Model Optimizations¶

TruEra optimizations for tree-based and scikit-learn sklearn.linear_mode are enabled by ingesting the model object directly using the Python SDK. Or, you can add a get_models() function to your packaged model wrapper (see Model Packaging and Execution).

Click Next below to continue.

truera-qii – unbiased estimates of Shapley values that quantify the contribution (influence) of individual features in making a model’s decision on a given datapoint. See Understanding QII for additional details.
tree-shap-tree-path-dependent – for tree-based models, the average change in model output is conditioned on the given feature when introducing features one at a time over all feature orderings (see shap.explainers.tree)
tree-shap-interventional – breaks the dependencies between features according to the rules dictated by causal inference requiring a background dataset; runtime scales linearly with the background dataset (see Understanding Interventional TreeSHAP)
kernal-shap – model agnostic method to approximate SHAP values using ideas from LIME and Shapley values (see shap.Kernal.Explainer)

Upload error influences using the SDK's add_model_error_influences() method, where error_influence_data is a pd.DataFrame already aligned with the pre-processed data of the given split.

In generating influences, feature influences or error influences, two orthogonal axes determine the score_type allowed:

Project Type — classification, regression, or ranking
Scope — applies to the entire project or only to a specific computation, such as influences.

Using these determinants, the possible values allowed for score_type are:

Classification for entire project:
probits
Probability based on deviation from the mean of a standard distribution; as a form of binary regression, the dependent variable can take only two values, for example married or not married.
,
logits
Also known as the log-odds function, represents probability values from 0 to 1, and negative infinity to infinity.
, or
classification
Any score or metric used to compute the performance of the classification; i.e, how well it works and its predictive power.
Classification for a specific computation: probits, logits, classification or
mean_absolute_error_for_classification
Takes the average of absolute errors for a group of predictions and observations as a measurement of the magnitude of errors for the entire group.
Regression for entire project:
regression
Measures the average deviation of the errors of the regression model, where the smaller the value of the standard error of the estimate, the better the fit of the regression model to the data.
Regression for a specific computation: regression or
mean_absolute_error_for_regression
Measures the average size of the mistakes in a collection of predictions, without taking their direction into account; the average absolute difference between the predicted values and the actual values used to assess the effectiveness of a regression model.
.