Data CollectionAn organized inventory of data consisting of individual data splits used for a particular model.
Basic Ingestion Methods¶
Similar to data ingestion, you can use the Python SDK for model ingestion/import with multiple paths depending on whether your model is already packaged and how. If you're a new user, you may want to do a quick review of the general project structure supported by TruEra.
Otherwise, this topic on basic methods includes starter guidance on adding a model, computing and adding predictions, as well as computing and adding feature and error influences.
First things first
Data SplitOne of two or more subsets of the data collection. Typically, with a two-part split, one part is used to evaluate or test the data, while the other is used to train the model.
Packaged ModelComprises an executable Python model object, along with a collection of modules arranged in a hierarchy of folders that includes an __init__.py file.
TrueraWorkspace.add_packaged_python_model() to ingest a packaged model. This method registers and adds the new model, including the executable model object provided, deducing the model framework to appropriately serialize and upload the model object to the TruEra server.
If you are unable to ingest your model using this method due to custom logic, feature transforms, et al, consider using
Models Without an Executable Object¶
Models without an executable object can be ingested by TruEra using the SDK's
add_model() method (discussed with example code in the Quickstart). However, model predictions and influences will have to be computed externally, and then added.
Add your externally computed predictions for a given split with
Add externally computed feature influences with
Add externally computed error influences with
For more on adding feature and error influences, click here.
See Tutorial: Adding a Model for a notebook tutorial on virtual model ingestion.
Ingesting Feature Transformations¶
Using QII for influences, you can wrap any set of model transformations and provide influences with respect to pre-transformed (human-readable) features. However, this must be done locally and then added.
Principal Component AnalysisUnsupervised learning technique for reducing the dimensionality of data; increases interpretability while minimizing information loss.
one-hot encodingRepresents categorical variables as numerical values
multi-hot encodingBinary encoding of multiple tokens in a single vector
To enable these optimizations, post-ingestion, you have two options (the first is recommended):
Python models – if the transformation from raw human-readable data to model-readable data can be expressed as a function, add this function to the packaged model wrapper as an additional
transformfunction. Details can be found in Custom Data Transformation.
Java models – if the transformation cannot be simply expressed as a function, you can instead capture data before and after the transformation and ingest them as pre-transform data and post-transform data.
In both cases, add feature mapping from the columns of pre-transformed data to the post-transformed data. This can be done with the Python SDK during data collection creation.
Post-ingestion Linear and Tree Model Optimizations¶
TruEra optimizations for tree-based and scikit-learn
sklearn.linear_mode are enabled by ingesting the model object directly using the Python SDK. Or, you can add a
get_models() function to your packaged model wrapper (see Model Packaging and Execution).
Click Next below to continue.