Qualitative Input InfluenceUnbiased estimates of Shapley values that quantify the contribution (influence) of individual features in making a model’s decision on a given datapoint.
Model Packaging and Execution¶
Unable to serialize your model?
Should you be unable to serialize an executable model object, consider ingesting a virtual model.
Standard packaging and serialization of an unpackaged model entails creating and adding the packaged model locally using the Python SDK's:
tru.create_packaged_python_model()method to create the package
tru.add_packaged_python_model()to upload the packaged model.
Once packaging and serialization is complete, TruEra attempts to run the ingested model by:
- Building the model environment (for Python models only, this entails installing any specified
pippackages; see Python/PIP prerequisites).
- Deserializing and loading the model object.
- Calling your model to generate predictions based on a split.
Use the following checklist to help ensure successful ingestion and execution of your model:
TruEra Model Ingestion Checklist
Necessary packages/dependencies are specified.
Model wrapper is formatted correctly; all necessary files are present.
Model can be deserialized or the model wrapper can be loaded.
Model wrapper works on test data and generates predictions.
It's important to remember that the Python SDK supports direct upload of
scikit-learn model objects from within your Python environment. Moreover, calling
tru.add_python_model() will automatically serialize the model, infer any
pip dependencies, format the model wrapper, and upload it to TruEra.
Hence, if you are providing a natively supported model type, you should upload the model object directly via the SDK to avoid the additional overhead of formatting and testing a model wrapper.
If, however, one of the following is true, you'll need to manually package your model:
- your model is Java-based
- includes custom code supplementing a more generic modeling framework
- TruEra's out-of-the-box flow fails
See Ingesting Custom Python Models via Custom Model Wrapper for an end-to-end tutorial on ingesting custom ensemble models.
TruEra's recommended serialization format for standard model wrappers is covered next.
The general structure for a packaged model directory looks like this:
📂 DIRECTORY OF PACKAGED MODEL ┣ 📜 conda.yaml (Python models only) ┣ 📜 MLmodel ┣ 📜 model.pkl ┗ 📂 code ┗ 📜 model_wrapper.py (Python models only)
Components of the packaged model directory comprise:
- conda.yaml – Python models only; specifies the Python environment required to load the model, such as any
pipdependencies, and follows Conda's standard YAML config.
- MLmodel – contains information about the flavor/framework of the packaged model, as well as the entry point to launch the model, such as the path to the pickle or JAR file. The
MLModelfile is typically auto-generated when the model is packaged via the CLI or SDK and rarely needs to be edited.
- model.pkl – serialized model (pickle, jar, zip, etc.)
- code – contains the wrapper
- model_wrapper.py – Python models only; contains a Python class of wrapping functions that load and call the model to generate predictions.
Model wrapper format is covered next.
Python Model Wrappers¶
A typical Python model wrapper might look like the following:
import pandas as pd import numpy as np import tensorflow as tf from tensorflow import keras class PredictProbaWrapper(object): def __init__(self, model): self.model = model def predict(self, df): predictions = self.model.predict(df, batch_size=len(df)) prob_1 = tf.sigmoid(predictions) probs = np.hstack([1.0 - prob_1, prob_1]) return pd.DataFrame(probs, columns=[0, 1]) def _load_model_from_local_file(path): return PredictProbaWrapper(tf.keras.models.load_model(path)) def _load_pyfunc(path): return _load_model_from_local_file(path)
PredictProbaWrapper– a wrapper class containing one model attribute (
self.model) and one function (
predict(df)) to calculate the model's classification probabilities.
_load_pyfunc(path)– retrieves the loaded model from
pathcorresponding to the
--model_pathflag passed during packaging. This
_load_pyfunc(path)function returns an instance of an object with a
predict(df)method demonstrated by the
predict(df)– function in
PredictProbaWrapperthat reads in the given
pd.DataFrame, generates predictions using
self.model, and returns a corresponding dataframe of per-class probabilities for each example (i.e., the rows match up).
The shape of the
pd.DataFrame returned by
predict for a
pd.DataFrame input of n rows and m columns should be:
(n)if the model is a regressor (single column with the regression value)
(n, 2)if the model is a classifier (one column per binary outcome of 0 or 1, with logits or probits value)
All the imports you use in your custom code should be included in your model's conda.yml file.
Custom Data Transformation Packaging Optimizations¶
!! careful inline end Segregating a transformation function from the model predictor is useful only where the model transformation is a one-to-many mapping between pre- and post-transform features.
Accessing a transformer object already fitted using training data may be necessary to implement your custom transformation within the wrapper file. Using the
--model_path flag during packaging, the
path parameter passed to
_load_pyfunc points to the file or directory provided. Thus, similar to the model itself, you can provide an arbitrary serialized transformer object within the directory.
Consequently, if you wish to package a one-to-many model transform function alongside your Python model, modify the packaged model wrapper to include a method with the following signature:
def transform(self, pre_transform_df: pd.DataFrame) -> pd.DataFrame<-->
--model_path /path/to/model_dataoption. The
model_datadirectory contains the files
import os import cloudpickle as pickle import pandas as pd class PredictProbaWrapper(object): def __init__(self, model, transformer): self.model = model self.transformer = transformer def predict(self, model_input): return self.model.predict_proba(model_input) def transform(self, pre_transform_df: pd.DataFrame) -> pd.DataFrame: post_transform_df = pd.DataFrame( self.transformer.transform(pre_transform_df), columns=self.transformer.get_feature_names()) return post_transform_df def _load_pyfunc(path: str): with open(os.path.join(path, "model.pkl"), "rb") as fp: loaded_model = pickle.load(fp) with open(os.path.join(path, "transform.pkl"), "rb") as fp: loaded_transformer = pickle.load(fp) return PredictProbaWrapper(loaded_model, loaded_transformer)
Remember, you must also add a feature map to the data collection specifying the mapping between pre- and post-transform features. This can be done using
add_data_collection() during creation of the data collection.
Check out our feature transformation notebook tutorial to see an end-to-end example of this at work in the Python SDK.
Tree and Linear QII Packaging Optimizations¶
To enable these, ingesting the model directly using the Python SDK is strongly recommended, as these optimizations are added automatically. If using the Python SDK is not an option, then tree and linear optimizations can be enabled modifying the model wrapper class shown above. The
get_models() method returns the underlying model object that must be returned, as demonstrated in the example below.
get_models() method is implemented in the provided model wrapper, the corresponding
predict method will not implement custom logic. To do so, it must call the underlying model and return the resulting model scores. Any additional transformations to the data must occur within the
transform parameter of the wrapper as described above or added as post-processed split data.
These optimizations are supported for the following model classes:
- Tree-based models –
sklearn.pipeline- if any of the above models are the last step of the pipeline
Click Next below to continue.