Qualitative Input Influence
Unbiased estimates of Shapley values that quantify the contribution (influence) of individual features in making a modelβs decision on a given datapoint.Model Packaging and Execution¶
Unable to serialize your model?
Should you be unable to serialize an executable model object, consider ingesting a virtual model.
Standard packaging and serialization of an unpackaged model entails creating and adding the packaged model locally using the Python SDK's:
tru.create_packaged_python_model()
method to create the packagetru.add_packaged_python_model()
to upload the packaged model.
Once packaging and serialization is complete, TruEra attempts to run the ingested model by:
- Building the model environment (for Python models only, this entails installing any specified
pip
packages; see Python/PIP prerequisites). - Deserializing and loading the model object.
- Calling your model to generate predictions based on a split.
Use the following checklist to help ensure successful ingestion and execution of your model:
TruEra Model Ingestion Checklist
Necessary packages/dependencies are specified.
Model wrapper is formatted correctly; all necessary files are present.
Model can be deserialized or the model wrapper can be loaded.
Model wrapper works on test data and generates predictions.
It's important to remember that the Python SDK supports direct upload of scikit-learn
model objects from within your Python environment. Moreover, calling tru.add_python_model()
will automatically serialize the model, infer any pip
dependencies, format the model wrapper, and upload it to TruEra.
Hence, if you are providing a natively supported model type, you should upload the model object directly via the SDK to avoid the additional overhead of formatting and testing a model wrapper.
If, however, one of the following is true, you'll need to manually package your model:
- your model is Java-based
- includes custom code supplementing a more generic modeling framework
- TruEra's out-of-the-box flow fails
See Ingesting Custom Python Models via Custom Model Wrapper for an end-to-end tutorial on ingesting custom ensemble models.
TruEra's recommended serialization format for standard model wrappers is covered next.
Serialization Structure¶
The general structure for a packaged model directory looks like this:
π DIRECTORY OF PACKAGED MODEL
β£ π conda.yaml (Python models only)
β£ π MLmodel
β£ π model.pkl
β π code
β π model_wrapper.py (Python models only)
Components of the packaged model directory comprise:
- conda.yaml β Python models only; specifies the Python environment required to load the model, such as any
pip
dependencies, and follows Conda's standard YAML config. - MLmodel β contains information about the flavor/framework of the packaged model, as well as the entry point to launch the model, such as the path to the pickle or JAR file. The
MLModel
file is typically auto-generated when the model is packaged via the CLI or SDK and rarely needs to be edited. - model.pkl β serialized model (pickle, jar, zip, etc.)
- code β contains the wrapper
- model_wrapper.py β Python models only; contains a Python class of wrapping functions that load and call the model to generate predictions.
Model wrapper format is covered next.
Python Model Wrappers¶
A typical Python model wrapper might look like the following:
Example
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
class PredictProbaWrapper(object):
def __init__(self, model):
self.model = model
def predict(self, df):
predictions = self.model.predict(df, batch_size=len(df))
prob_1 = tf.sigmoid(predictions)
probs = np.hstack([1.0 - prob_1, prob_1])
return pd.DataFrame(probs, columns=[0, 1])
def _load_model_from_local_file(path):
return PredictProbaWrapper(tf.keras.models.load_model(path))
def _load_pyfunc(path):
return _load_model_from_local_file(path)
Specification
PredictProbaWrapper
β a wrapper class containing one model attribute (self.model
) and one function (predict(df)
) to calculate the model's classification probabilities._load_pyfunc(path)
β retrieves the loaded model frompath
corresponding to the--model_path
flag passed during packaging. This_load_pyfunc(path)
function returns an instance of an object with apredict(df)
method demonstrated by thePredictProbaWrapper
class.predict(df)
β function inPredictProbaWrapper
that reads in the givenpd.DataFrame
, generates predictions usingself.model
, and returns a corresponding dataframe of per-class probabilities for each example (i.e., the rows match up).
The shape of the pd.DataFrame
returned by predict
for a pd.DataFrame
input of n rows and m columns should be:
(n)
if the model is a regressor (single column with the regression value)(n, 2)
if the model is a classifier (one column per binary outcome of 0 or 1, with logits or probits value)
All the imports you use in your custom code should be included in your model's conda.yml file.
Custom Data Transformation Packaging Optimizations¶
!! careful inline end Segregating a transformation function from the model predictor is useful only where the model transformation is a one-to-many mapping between pre- and post-transform features.
Accessing a transformer object already fitted using training data may be necessary to implement your custom transformation within the wrapper file. Using the --model_path
flag during packaging, the path
parameter passed to _load_pyfunc
points to the file or directory provided. Thus, similar to the model itself, you can provide an arbitrary serialized transformer object within the directory.
Consequently, if you wish to package a one-to-many model transform function alongside your Python model, modify the packaged model wrapper to include a method with the following signature:
def transform(self, pre_transform_df: pd.DataFrame) -> pd.DataFrame<-->
--model_path /path/to/model_data
option. The model_data
directory contains the files model.pkl
and transform.pkl
.
import os
import cloudpickle as pickle
import pandas as pd
class PredictProbaWrapper(object):
def __init__(self, model, transformer):
self.model = model
self.transformer = transformer
def predict(self, model_input):
return self.model.predict_proba(model_input)
def transform(self, pre_transform_df: pd.DataFrame) -> pd.DataFrame:
post_transform_df = pd.DataFrame(
self.transformer.transform(pre_transform_df),
columns=self.transformer.get_feature_names())
return post_transform_df
def _load_pyfunc(path: str):
with open(os.path.join(path, "model.pkl"), "rb") as fp:
loaded_model = pickle.load(fp)
with open(os.path.join(path, "transform.pkl"), "rb") as fp:
loaded_transformer = pickle.load(fp)
return PredictProbaWrapper(loaded_model, loaded_transformer)
Remember, you must also add a feature map to the data collection specifying the mapping between pre- and post-transform features. This can be done using add_data_collection()
during creation of the data collection.
Tip
Check out our feature transformation notebook tutorial to see an end-to-end example of this at work in the Python SDK.
Tree and Linear QII Packaging Optimizations¶
sklearn.linear_model
).To enable these, ingesting the model directly using the Python SDK is strongly recommended, as these optimizations are added automatically. If using the Python SDK is not an option, then tree and linear optimizations can be enabled modifying the model wrapper class shown above. The get_models()
method returns the underlying model object that must be returned, as demonstrated in the example below.
Careful
If the get_models()
method is implemented in the provided model wrapper, the corresponding predict
method will not implement custom logic. To do so, it must call the underlying model and return the resulting model scores. Any additional transformations to the data must occur within the transform
parameter of the wrapper as described above or added as post-processed split data.
These optimizations are supported for the following model classes:
sklearn.linear_model
βLogisticRegression
orLinearRegression
- Tree-based models β
sklearn.ensemble.RandomForestClassifier
,sklearn.tree.DecisionTreeClassifier
, etc. XGBoost
modelsCatBoost
modelsLightGBM
models βlightgbm.Booster
,lightgbm.LGBMClassifier
, andlightgbm.LGBMRegressor
sklearn.pipeline
- if any of the above models are the last step of the pipeline
Click Next below to continue.