Skip to content

Python SDK Tutorial: Basic Local Compute Flow

In this notebook tutorial, we'll use TruEra's Python SDK to show a basic local compute flow.

What does "local compute" mean? What's different about it?

Sometimes it can be desirable to perform computations locally, especially if it is hard to execute your model in a remote environment. In local compute mode, you can use the SDK to analyze your model from your local machine. This unlocks a limited set of TruEra features wherever you have the Python SDK installed.

Before you begin ⬇️

⚠️ Make sure you have truera SDK package installed before going through this tutorial. See the Installation Instructions for additional help. You will also need to install an explanation package of your choice. We recommend TruEra's truera-qii package if you have access to it; otherwise, you may use SHAP, which might be slower and less accurate.

👉 You can download and run this notebook by navigating to the Downloads page of your deployment and downloading the "Python SDK Local Quickstart" example notebook.

What we'll cover ☑️

  • Create a project with some split data and models.
  • Compare performance and explanations across models

In this basic local compute flow, we're creating splits from basic pandas DataFrame objects.

Step 1: Connect to TruEra endpoint

What do I need to connect to my TruEra deployment?

  • TruEra deployment URL. For most users, the TruEra URI will take the form https://<your-truera-access-url>.
  • Some form of authentication (basic auth or token auth).

For examples on how to authenticate, see Authentication in the Diagnostics Quickstart. Here, we will use token authentication.

# FILL ME! 

TRUERA_URL = "<TRUERA_URL>"
TOKEN = '<AUTH_TOKEN>'
from truera.client.truera_workspace import TrueraWorkspace
from truera.client.truera_authentication import TokenAuthentication

auth = TokenAuthentication(TOKEN)
tru = TrueraWorkspace(TRUERA_URL, auth)
INFO:truera.client.remote_truera_workspace:Connecting to 'https://app.truera.net'

Step 2: Download sample data

Here we'll use data from scikit-learn's California housing dataset. This can be installed via the sklearn.datasets module.

# Retrieve the data.

import pandas as pd
from sklearn.datasets import fetch_california_housing

data_bunch = fetch_california_housing()
XS_ALL = pd.DataFrame(data=data_bunch["data"], columns=data_bunch["feature_names"])
YS_ALL = pd.DataFrame(data=data_bunch["target"], columns=["label"])
# Create train and test data splits.

from sklearn.model_selection import train_test_split

XS_TRAIN, XS_TEST, YS_TRAIN, YS_TEST = train_test_split(XS_ALL, YS_ALL, test_size=100)
data_all = XS_ALL.merge(YS_ALL, left_index=True, right_index=True).reset_index(names="id")
data_test = XS_TEST.merge(YS_TEST, left_index=True, right_index=True).reset_index(names="id")
data_train = XS_TRAIN.merge(YS_TRAIN, left_index=True, right_index=True).reset_index(names="id")

Step 3: Create a project

Note how this is very similar to the remote flow demonstrated in the Diagnostics Quickstart -- the notable difference is that we set our model execution to be local with tru.set_model_execution("local"), and the subsequent commands are nearly identical.

tru.set_model_execution("local")
tru.add_project("California Housing-2", score_type="regression")
INFO:truera.client.truera_workspace:Model execution environment set to 'local'

Step 4: Add the data collection and data split

Here we're adding data via simple pd.DataFrames.

from truera.client.ingestion import ColumnSpec

column_spec = ColumnSpec(
    id_col_name="id",
    pre_data_col_names=XS_ALL.columns.to_list(),
    label_col_names=YS_ALL.columns.to_list()
)
tru.add_data_collection("sklearn_data")
tru.add_data(data_all, data_split_name="all", column_spec=column_spec)
tru.add_data(data_train, data_split_name="train", column_spec=column_spec)
tru.add_data(data_test, data_split_name="test", column_spec=column_spec)
Uploading tmpfnulj0_c.parquet (927.3KiB) -- ### -- file upload complete.
Put resource done.

INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...
INFO:truera.client.remote_truera_workspace:Materialize operation id: 7b3ddd3b-98fd-4c6b-9066-ff8b48e4beeb finished with status: SUCCEEDED.

Uploading tmp0qg4kbic.parquet (933.0KiB) -- ### -- file upload complete.
Put resource done.

INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...
INFO:truera.client.remote_truera_workspace:Materialize operation id: ef4b9ea8-e310-438f-8bbd-c0f4db42694f finished with status: SUCCEEDED.

Uploading tmp2hfxwk0i.parquet (13.6KiB) -- ### -- file upload complete.
Put resource done.

INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...
INFO:truera.client.remote_truera_workspace:Materialize operation id: 832e5316-e5a7-4fff-9e90-2ed9ac48a8b4 finished with status: SUCCEEDED.

Step 5: Train and add a linear regression model

# Train the model.

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

lr_model = LinearRegression()
lr_model.fit(XS_TRAIN, YS_TRAIN)
print(f"RMSE = {mean_squared_error(YS_TEST, lr_model.predict(XS_TEST), squared=False)}")
RMSE = 0.553543059676275

We can add the model itself via tru.add_python_model(), which accepts a number of out-of-the box model frameworks.

# Add to TruEra workspace.

tru.add_python_model("linear regression", lr_model)
tru.compute_all()
INFO:truera.client.remote_truera_workspace:Uploading sklearn model: LinearRegression
WARNING:truera.client.services.aiq_client:The number of records returned will not be the exact number requested but in the neighborhood of the start and stop limit provided.
INFO:truera.client.remote_truera_workspace:Verifying model...
INFO:truera.client.remote_truera_workspace:✔️ Verified packaged model format.
INFO:truera.client.remote_truera_workspace:✔️ Loaded model in current environment.
INFO:truera.client.remote_truera_workspace:✔️ Called predict on model.
INFO:truera.client.remote_truera_workspace:✔️ Verified model output.
INFO:truera.client.remote_truera_workspace:Verification succeeded!

Uploading MLmodel (214.0B) -- ### -- file upload complete.
Uploading conda.yaml (210.0B) -- ### -- file upload complete.
Uploading tmp98t53eh8 (774.0B) -- ### -- file upload complete.
Uploading sklearn_regression_predict_wrapper.py (431.0B) -- ### -- file upload complete.
Uploading sklearn_regression_predict_wrapper.cpython-310.pyc (1.1KiB) -- ### -- file upload complete.
Put resource done.

INFO:truera.client.remote_truera_workspace:Model "linear regression" added and associated with data collection "sklearn_data". "linear regression" is set as the model in context.
INFO:truera.client.remote_truera_workspace:Model uploaded to: https://app.truera.net/home/p/California%20Housing-2/m/linear%20regression/
WARNING:truera.client.intelligence.remote_explainer:Background split for `data_collection` "sklearn_data" is currently not set. Setting it to "all"
INFO:truera.client.truera_workspace:Downloading artifacts to temp_dir: /var/folders/6g/rp51n4c10mldf_61mqzc0mc00000gn/T/tmpxe39sgyk
INFO:truera.client.truera_workspace:Downloading model linear regression...

Uploading tmpf_zwt6vg.parquet (337.3KiB) -- ### -- file upload complete.
Put resource done.

INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...
INFO:truera.client.remote_truera_workspace:Materialize operation id: c428fe02-f1b9-48d5-b79b-249ea1ed1789 finished with status: SUCCEEDED.
INFO:truera.client.truera_workspace:Downloading artifacts to temp_dir: /var/folders/6g/rp51n4c10mldf_61mqzc0mc00000gn/T/tmpxe39sgyk
INFO:truera.client.truera_workspace:Downloading model linear regression...

|          | 0.000% [00:00<?]
Uploading tmp7qpo31hg.parquet (85.3KiB) -- ### -- file upload complete.
Put resource done.

INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...
INFO:truera.client.remote_truera_workspace:Materialize operation id: fe594ba2-4399-4d50-a19e-935bcf7f83f0 finished with status: SUCCEEDED.
INFO:truera.client.truera_workspace:Inferred error `score_type` to be "mean_absolute_error_for_regression"
INFO:truera.client.truera_workspace:Downloading artifacts to temp_dir: /var/folders/6g/rp51n4c10mldf_61mqzc0mc00000gn/T/tmpxe39sgyk
INFO:truera.client.truera_workspace:Downloading model linear regression...

|          | 0.000% [00:00<?]
Uploading tmp2dgiv0tr.parquet (85.3KiB) -- ### -- file upload complete.
Put resource done.

INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...
INFO:truera.client.remote_truera_workspace:Materialize operation id: b6c1d667-9cc4-4c84-8c43-557efceb0b31 finished with status: SUCCEEDED.
INFO:truera.client.truera_workspace:Downloading artifacts to temp_dir: /var/folders/6g/rp51n4c10mldf_61mqzc0mc00000gn/T/tmpxe39sgyk
INFO:truera.client.truera_workspace:Downloading model linear regression...

Uploading tmpfq_2bz43.parquet (335.7KiB) -- ### -- file upload complete.
Put resource done.

INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...
INFO:truera.client.remote_truera_workspace:Materialize operation id: 6696a72e-cfbd-4496-8c8c-f25a281620e0 finished with status: SUCCEEDED.
INFO:truera.client.truera_workspace:Downloading artifacts to temp_dir: /var/folders/6g/rp51n4c10mldf_61mqzc0mc00000gn/T/tmpxe39sgyk
INFO:truera.client.truera_workspace:Downloading model linear regression...

|          | 0.000% [00:00<?]
Uploading tmpjycxue1q.parquet (85.3KiB) -- ### -- file upload complete.
Put resource done.

INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...
INFO:truera.client.remote_truera_workspace:Materialize operation id: 1b7064bb-703f-4a2e-9fe9-fe54fe5189ee finished with status: SUCCEEDED.
INFO:truera.client.truera_workspace:Inferred error `score_type` to be "mean_absolute_error_for_regression"
INFO:truera.client.truera_workspace:Downloading artifacts to temp_dir: /var/folders/6g/rp51n4c10mldf_61mqzc0mc00000gn/T/tmpxe39sgyk
INFO:truera.client.truera_workspace:Downloading model linear regression...

|          | 0.000% [00:00<?]
Uploading tmp6y4i8lys.parquet (85.3KiB) -- ### -- file upload complete.
Put resource done.

INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...
INFO:truera.client.remote_truera_workspace:Materialize operation id: fc29854e-b5d8-42b1-92af-4fafd510f17c finished with status: SUCCEEDED.
INFO:truera.client.truera_workspace:Downloading artifacts to temp_dir: /var/folders/6g/rp51n4c10mldf_61mqzc0mc00000gn/T/tmpxe39sgyk
INFO:truera.client.truera_workspace:Downloading model linear regression...

Uploading tmpi6kzbnh7.parquet (3.7KiB) -- ### -- file upload complete.
Put resource done.

INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...
INFO:truera.client.remote_truera_workspace:Materialize operation id: 8fbcef8e-11b0-4d0d-8cb2-c1643b39477c finished with status: SUCCEEDED.
INFO:truera.client.truera_workspace:Downloading artifacts to temp_dir: /var/folders/6g/rp51n4c10mldf_61mqzc0mc00000gn/T/tmpxe39sgyk
INFO:truera.client.truera_workspace:Downloading model linear regression...

|          | 0.000% [00:00<?]
Uploading tmpi48qsp3y.parquet (14.3KiB) -- ### -- file upload complete.
Put resource done.

INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...
INFO:truera.client.remote_truera_workspace:Materialize operation id: 9cb8a7dd-c65a-4a66-99a2-d22b54bd1072 finished with status: SUCCEEDED.
INFO:truera.client.truera_workspace:Inferred error `score_type` to be "mean_absolute_error_for_regression"
INFO:truera.client.truera_workspace:Downloading artifacts to temp_dir: /var/folders/6g/rp51n4c10mldf_61mqzc0mc00000gn/T/tmpxe39sgyk
INFO:truera.client.truera_workspace:Downloading model linear regression...

|          | 0.000% [00:00<?]
Uploading tmptgmg2ovm.parquet (14.3KiB) -- ### -- file upload complete.
Put resource done.

INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...
INFO:truera.client.remote_truera_workspace:Materialize operation id: a0b70f44-4b4b-4382-9a4c-6ebaaa5e2ee5 finished with status: SUCCEEDED.
INFO:truera.client.remote_truera_workspace:Data collection in workspace context set to "sklearn_data".
INFO:truera.client.remote_truera_workspace:Setting model context to "linear regression".

# View ISP.

tru.get_explainer(base_data_split="test").plot_isp(feature='HouseAge')