Skip to content

Tutorial: Adding a Virtual Model

Stepwise coverage of TruEra's generally recommended flow for data and model ingestion is found in the Python SDK Tutorial: Local Compute Flow, available in TruEra's Notebook Library.

In this tutorial, we presents an example of ingesting virtual models via the Python SDK.

Data to be ingested include:

  • Input Data
  • Label Data
  • Prediction Data for provided virtual model
  • Feature Influence Data for provided virtual model

Before you begin

Set your TruEra URL and authentication token

  • Provide your TruEra deployment URL. Free users will use https://app.truera.net
  • Provide your Authentication Token, available here
  • Create your TruEra workspace object!
# FILL ME!
TRUERA_URL = "https://app.truera.net"
AUTH_TOKEN = ""

Install packages

! pip install truera
! pip install s3fs
WARNING: Ignoring invalid distribution -yarrow (/usr/local/lib/python3.10/dist-packages)
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting truera
  Using cached truera-11.5.6-py3-none-any.whl (860 kB)
Requirement already satisfied: cachetools>=5.2.0 in /usr/local/lib/python3.10/dist-packages (from truera) (5.3.0)
Requirement already satisfied: click>=8.0 in /usr/local/lib/python3.10/dist-packages (from truera) (8.1.3)
Requirement already satisfied: cloudpickle>=1.2.2 in /usr/local/lib/python3.10/dist-packages (from truera) (2.2.1)
Requirement already satisfied: crontab==0.23.0 in /usr/local/lib/python3.10/dist-packages (from truera) (0.23.0)
Requirement already satisfied: cryptography>=3.2 in /usr/local/lib/python3.10/dist-packages (from truera) (40.0.2)
Collecting dynaconf==3.1.11 (from truera)
  Using cached dynaconf-3.1.11-py2.py3-none-any.whl (211 kB)
Requirement already satisfied: filelock>=3.0.12 in /usr/local/lib/python3.10/dist-packages (from truera) (3.12.0)
Requirement already satisfied: googleapis-common-protos>=1.56.0 in /usr/local/lib/python3.10/dist-packages (from truera) (1.59.0)
Requirement already satisfied: grpcio>=1.50.0 in /usr/local/lib/python3.10/dist-packages (from truera) (1.54.0)
Collecting grpcio-status>=1.50.0 (from truera)
  Using cached grpcio_status-1.54.0-py3-none-any.whl (5.1 kB)
Collecting importlib-metadata>=4.8.1 (from truera)
  Using cached importlib_metadata-6.6.0-py3-none-any.whl (22 kB)
Requirement already satisfied: nbformat>=5.1.2 in /usr/local/lib/python3.10/dist-packages (from truera) (5.8.0)
Requirement already satisfied: pandas>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from truera) (1.5.3)
Requirement already satisfied: plotly>=5.11.0 in /usr/local/lib/python3.10/dist-packages (from truera) (5.13.1)
Collecting protobuf>=4.21.12 (from truera)
  Using cached protobuf-4.23.0-cp37-abi3-manylinux2014_x86_64.whl (304 kB)
Requirement already satisfied: pyarrow>=10.0.0 in /usr/local/lib/python3.10/dist-packages (from truera) (12.0.0)
Requirement already satisfied: pydantic>=1.8.0 in /usr/local/lib/python3.10/dist-packages (from truera) (1.10.7)
Requirement already satisfied: pyjwt>=2.4.0 in /usr/local/lib/python3.10/dist-packages (from truera) (2.7.0)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from truera) (2.8.2)
Requirement already satisfied: pyyaml>=5.3.1 in /usr/local/lib/python3.10/dist-packages (from truera) (6.0)
Requirement already satisfied: requests>=2.25.0 in /usr/local/lib/python3.10/dist-packages (from truera) (2.27.1)
Requirement already satisfied: scikit-learn>=0.21.2 in /usr/local/lib/python3.10/dist-packages (from truera) (1.2.2)
Requirement already satisfied: sqlparse==0.4.2 in /usr/local/lib/python3.10/dist-packages (from truera) (0.4.2)
Requirement already satisfied: tabulate>=0.8.9 in /usr/local/lib/python3.10/dist-packages (from truera) (0.8.10)
Requirement already satisfied: tqdm>4.25.0 in /usr/local/lib/python3.10/dist-packages (from truera) (4.65.0)
Requirement already satisfied: urllib3>=1.25.8 in /usr/local/lib/python3.10/dist-packages (from truera) (1.26.15)
Requirement already satisfied: humanize>=4.3.0 in /usr/local/lib/python3.10/dist-packages (from truera) (4.6.0)
Requirement already satisfied: numpy<1.24.0 in /usr/local/lib/python3.10/dist-packages (from truera) (1.22.4)
Requirement already satisfied: mpmath==0.19 in /usr/local/lib/python3.10/dist-packages (from truera) (0.19)
Requirement already satisfied: sympy>=1.10.1 in /usr/local/lib/python3.10/dist-packages (from truera) (1.11.1)
Requirement already satisfied: cffi>=1.12 in /usr/local/lib/python3.10/dist-packages (from cryptography>=3.2->truera) (1.15.1)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata>=4.8.1->truera) (3.15.0)
Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.10/dist-packages (from nbformat>=5.1.2->truera) (2.16.3)
Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.10/dist-packages (from nbformat>=5.1.2->truera) (4.3.3)
Requirement already satisfied: jupyter-core in /usr/local/lib/python3.10/dist-packages (from nbformat>=5.1.2->truera) (5.3.0)
Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.10/dist-packages (from nbformat>=5.1.2->truera) (5.7.1)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.1.1->truera) (2022.7.1)
Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from plotly>=5.11.0->truera) (8.2.2)
Requirement already satisfied: typing-extensions>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.8.0->truera) (4.5.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->truera) (1.16.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.25.0->truera) (2022.12.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests>=2.25.0->truera) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.25.0->truera) (3.4)
Requirement already satisfied: scipy>=1.3.2 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.21.2->truera) (1.10.1)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.21.2->truera) (1.2.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.21.2->truera) (3.1.0)
Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.12->cryptography>=3.2->truera) (2.21)
Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat>=5.1.2->truera) (23.1.0)
Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat>=5.1.2->truera) (0.19.3)
Requirement already satisfied: platformdirs>=2.5 in /usr/local/lib/python3.10/dist-packages (from jupyter-core->nbformat>=5.1.2->truera) (3.3.0)
WARNING: Ignoring invalid distribution -yarrow (/usr/local/lib/python3.10/dist-packages)
Installing collected packages: protobuf, importlib-metadata, dynaconf, grpcio-status, truera
  Attempting uninstall: protobuf
    Found existing installation: protobuf 3.20.3
    Uninstalling protobuf-3.20.3:
      Successfully uninstalled protobuf-3.20.3
  Attempting uninstall: grpcio-status
    Found existing installation: grpcio-status 1.48.2
    Uninstalling grpcio-status-1.48.2:
      Successfully uninstalled grpcio-status-1.48.2
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pandas-gbq 0.17.9 requires pyarrow<10.0dev,>=3.0.0, but you have pyarrow 12.0.0 which is incompatible.
Successfully installed dynaconf-3.1.11 grpcio-status-1.54.0 importlib-metadata-6.6.0 protobuf-4.23.0 truera-11.5.6
WARNING: Ignoring invalid distribution -yarrow (/usr/local/lib/python3.10/dist-packages)
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting s3fs
  Downloading s3fs-2023.5.0-py3-none-any.whl (28 kB)
Collecting aiobotocore~=2.5.0 (from s3fs)
  Downloading aiobotocore-2.5.0-py3-none-any.whl (72 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 72.7/72.7 kB 3.6 MB/s eta 0:00:00
Collecting fsspec==2023.5.0 (from s3fs)
  Downloading fsspec-2023.5.0-py3-none-any.whl (160 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 160.1/160.1 kB 6.3 MB/s eta 0:00:00
Collecting aiohttp!=4.0.0a0,!=4.0.0a1 (from s3fs)
  Downloading aiohttp-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 30.5 MB/s eta 0:00:00
Collecting botocore<1.29.77,>=1.29.76 (from aiobotocore~=2.5.0->s3fs)
  Downloading botocore-1.29.76-py3-none-any.whl (10.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.4/10.4 MB 37.1 MB/s eta 0:00:00
Requirement already satisfied: wrapt>=1.10.10 in /usr/local/lib/python3.10/dist-packages (from aiobotocore~=2.5.0->s3fs) (1.14.1)
Collecting aioitertools>=0.5.1 (from aiobotocore~=2.5.0->s3fs)
  Downloading aioitertools-0.11.0-py3-none-any.whl (23 kB)
Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs) (23.1.0)
Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs) (2.0.12)
Collecting multidict<7.0,>=4.5 (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs)
  Downloading multidict-6.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (114 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 114.5/114.5 kB 13.4 MB/s eta 0:00:00
Collecting async-timeout<5.0,>=4.0.0a3 (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs)
  Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting yarl<2.0,>=1.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs)
  Downloading yarl-1.9.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (268 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 268.8/268.8 kB 28.3 MB/s eta 0:00:00
Collecting frozenlist>=1.1.1 (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs)
  Downloading frozenlist-1.3.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (149 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 149.6/149.6 kB 17.5 MB/s eta 0:00:00
Collecting aiosignal>=1.1.2 (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs)
  Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)
Collecting jmespath<2.0.0,>=0.7.1 (from botocore<1.29.77,>=1.29.76->aiobotocore~=2.5.0->s3fs)
  Downloading jmespath-1.0.1-py3-none-any.whl (20 kB)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.10/dist-packages (from botocore<1.29.77,>=1.29.76->aiobotocore~=2.5.0->s3fs) (2.8.2)
Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/local/lib/python3.10/dist-packages (from botocore<1.29.77,>=1.29.76->aiobotocore~=2.5.0->s3fs) (1.26.15)
Requirement already satisfied: idna>=2.0 in /usr/local/lib/python3.10/dist-packages (from yarl<2.0,>=1.0->aiohttp!=4.0.0a0,!=4.0.0a1->s3fs) (3.4)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.29.77,>=1.29.76->aiobotocore~=2.5.0->s3fs) (1.16.0)
WARNING: Ignoring invalid distribution -yarrow (/usr/local/lib/python3.10/dist-packages)
Installing collected packages: multidict, jmespath, fsspec, frozenlist, async-timeout, aioitertools, yarl, botocore, aiosignal, aiohttp, aiobotocore, s3fs
  Attempting uninstall: fsspec
    Found existing installation: fsspec 2023.4.0
    Uninstalling fsspec-2023.4.0:
      Successfully uninstalled fsspec-2023.4.0
Successfully installed aiobotocore-2.5.0 aiohttp-3.8.4 aioitertools-0.11.0 aiosignal-1.3.1 async-timeout-4.0.2 botocore-1.29.76 frozenlist-1.3.3 fsspec-2023.5.0 jmespath-1.0.1 multidict-6.0.4 s3fs-2023.5.0 yarl-1.9.2

Connect to your TruEra Endpoint

from truera.client.truera_workspace import TrueraWorkspace
from truera.client.truera_authentication import TokenAuthentication

auth = TokenAuthentication(AUTH_TOKEN)
tru = TrueraWorkspace(TRUERA_URL, auth)

tru.set_model_execution("local")

Adding a Project with Sample Data and Model

Based on datasets for Census Income, we have made this sample project available on a public s3.

The Census Income project, used throughout this quickstart tutorial, includes a formatted version of the data to illustrate the data ingestion process. For other frameworks, the process is similar.

Content in the census_income folder comprises:

  • quickstart_model.pkl – Pickled Python model for quickstart
  • data_raw.csv – training data, pre-transformed data (human-readable)
  • data_num.csv – data in model-readable form
  • label.csv – single-column containing ground truth labels
  • extra_data.csv – used for defining segments
  • feature_influence.csv - feature influence data
  • predictions.csv - prediction from model computed locally

Next, to upload the data to TruEra, you'll need to:

  1. Create a TruEra project
  2. Define a data collection
  3. Create Background split for feature influence
  4. Add a virtual model
  5. Add split data, labels, predictions and feature influences
  6. Start using TruEra Diagnostics

Step 1. Create a TruEra project

tru.add_project("AdultCensus_DemoNB_local_ingestion", score_type="probits")

Step 2. Define a data collection

tru.add_data_collection("demo_data_collection")

Now for data to upload, we'll load all required data as different Dataframes

import os
import pandas as pd
import numpy as np

s3_folder = "s3://truera-examples/data/census_income/" # path where you download the census_income quickstart data

data = pd.read_csv(s3_folder + "data_num.csv")
labels = pd.read_csv(s3_folder + "label.csv")
predictions = pd.read_csv(s3_folder + "predictions.csv")
# Feature Influences can be computed using local explainer
feature_influence = pd.read_csv(s3_folder + "feature_influence.csv")

Step 3. Create Background split for Feature Influence

from truera.client.ingestion import ColumnSpec

tru.add_data(
    data, 
    data_split_name="background_split", 
    column_spec=ColumnSpec(
        id_col_name="id",
        pre_data_col_names=[c for c in data.columns if c !="id"]
    )
)

Step 4. Add the model

model_name = "quickstart_demo"
tru.add_model(model_name)

Step 5. Add split data, labels, predictions and feature influences

tru.add_data(
    data.merge(labels, on="id").merge(predictions, on="id"),
    data_split_name="demo-all",
    column_spec=ColumnSpec(
        id_col_name="id",
        pre_data_col_names=[c for c in data.columns if c !="id"],
        label_col_names="label",
        prediction_col_names="score"
    )
)
tru.add_data(
    feature_influence,
    data_split_name="demo-all",
    column_spec=ColumnSpec(
        id_col_name="id",
        feature_influence_col_names=[c for c in feature_influence.columns if c !="id"]
    )
)

Validation

Post upload you should be able to validate the data by making following calls:

To validate feature influence

tru.get_feature_influences()

To validate predictions

tru.get_ys_pred()

To validate label data

tru.get_ys()