Python SDK Tutorial: Fairness Analysis¶

Fairness analyses give insight into whether your model is fair (using a metric and population that makes sense), and how to debug biases when they arise. This notebook covers how one might use TruEra's Python SDK to analyze and mitigate an unfairness present within a model.

Before you begin¶

Install the TruEra Python SDK
Complete the Diagnostics Quickstart
Check our primer on Explainer objects

What we'll cover ☑️¶

In this tutorial, we'll use a project and model already uploaded to our demo server, which is the Adult_Census project that predicts whether an individual's income is above a certain threshold based on some demographic factors. This is a classic dataset used in fairness analyses. We also provide a version of the Census dataset with our Quickstart data, which you can get from your deployment's Downloads page.
We'll evaluate the fairness of our model using one metric.
We'll then dig into the primary driver of the unfairness to understand why the model is exhibiting a bias.

To ingest the Adult_Census project yourself, follow the instructions in the Diagnostics Quickstart.

Step 1: Connect to TruEra endpoint¶

What do I need to connect?¶

TruEra deployment URI (the connection string).
Some form of authentication (basic auth or token auth).

For most users, the TruEra URI will take the form http://. Enterprise users may need to consult with their IT group for specific instructions.

For examples on how to authenticate, see the Authentication section of the Diagnostics Quickstart. Here, we will use basic authentication with a username and password.

CONNECTION_STRING = "<CONNECTION_STRING>"
USERNAME = "<USERNAME>"
PASSWORD = "<PASSWORD>"

from truera.client.truera_workspace import TrueraWorkspace
from truera.client.truera_authentication import BasicAuthentication

auth = BasicAuthentication(USERNAME, PASSWORD)
tru = TrueraWorkspace(CONNECTION_STRING, auth)

Step 2: Connect to the example `Adult_Census` example project¶

tru.set_project("Adult_Census")
tru.set_model("V1 - all") # this implicitly sets the data collection as well

INFO:truera.client.truera_workspace:Model execution environment set to 'remote'
INFO:truera.client.remote_truera_workspace:Data collection in remote environment is now set to "adult-base-dataset". 
INFO:truera.client.remote_truera_workspace:Setting remote model context to "V1 - all".

Step 3: Evaluate the fairness of the model for men vs. women¶

Let's first get a quick summary to see if there's enough bias between men and women in the model to be concerned. You can see here that we are computing the fairness on a predefined segment group that contains individual segments for both Males and Females. We can also provide a fairness metric (or fallback to the configured fairness metric defaults).

In this case we use the DISPARATE_IMPACT_RATIO, which measures the ratio of the selection rate between men and women, but you can browse through our supported metrics to find a fairness metric that is appropriate for your situation.

For guidance on creating these segments for your own project, see Segments.

explainer = tru.get_explainer(base_data_split="adult-base-dataset-split-all")
bias_result = explainer.compute_fairness(
    segment_group="adult-segment-gender",
    segment1="Male",
    segment2="Female",
    fairness_type="DISPARATE_IMPACT_RATIO",
)

bias_result

BiasResult:
    segment1_name: Male
    segment2_name: Female
    aggregate_metric: 3.4091169834136963
    segment1_metric: 0.2432188093662262
    segment2_metric: 0.07134363800287247
    favored_segment: Male
    metric_name: DISPARATE_IMPACT_RATIO

What information is in a `BiasResult` object?¶

segment1_name and segment2_name refer to the segment names we used in the analysis (Male and Female).
The aggregate_metric (disparate impact ratio, in this case) is 3.4.
The individual selection rates for segments 1 and 2 used to calculate the aggregate_metric is shown under segment1_metric and segment2_metric. As our metric of choice is a ratio, the aggregate_metric is equal to segment1_metric / segment2_metric.
The favored_segment shows which segment is advantaged by the model, based on how to interpret the model score.

This does seem somewhat concerning-- the Male segment is favored, and the disparate impact ratio (aggregate_metric) is heavily skewed, given the ideal value is 1. To look into further let's examine the distribution of the Male and Female segments. We can do this by using the Explainer.set_segment() method.

male_explainer = tru.get_explainer(base_data_split="adult-base-dataset-split-all")
male_explainer.set_segment("adult-segment-gender", segment_name="Male")

female_explainer = tru.get_explainer(base_data_split="adult-base-dataset-split-all")
female_explainer.set_segment("adult-segment-gender", segment_name="Female")

# male segment

print(male_explainer.get_xs().describe(include="all").T.to_markdown())

|                |   count |   unique | top                |   freq |        mean |          std |   min |    25% |    50% |    75% |             max |
|:---------------|--------:|---------:|:-------------------|-------:|------------:|-------------:|------:|-------:|-------:|-------:|----------------:|
| age            |    6636 |      nan | nan                |    nan |     39.5322 |     13.4011  |    17 |     29 |     38 |     49 |    90           |
| workclass      |    6636 |        9 | Private            |   4560 |    nan      |    nan       |   nan |    nan |    nan |    nan |   nan           |
| fnlwgt         |    6636 |      nan | nan                |    nan | 193743      | 106110       | 14878 | 119665 | 180722 | 243958 |     1.03322e+06 |
| education      |    6636 |       16 | HS-grad            |   2167 |    nan      |    nan       |   nan |    nan |    nan |    nan |   nan           |
| marital-status |    6636 |        7 | Married-civ-spouse |   3948 |    nan      |    nan       |   nan |    nan |    nan |    nan |   nan           |
| occupation     |    6636 |       15 | Craft-repair       |   1151 |    nan      |    nan       |   nan |    nan |    nan |    nan |   nan           |
| relationship   |    6636 |        5 | Husband            |   3919 |    nan      |    nan       |   nan |    nan |    nan |    nan |   nan           |
| race           |    6636 |        5 | White              |   5765 |    nan      |    nan       |   nan |    nan |    nan |    nan |   nan           |
| capital-gain   |    6636 |      nan | nan                |    nan |   1321.35   |   8184.63    |     0 |      0 |      0 |      0 | 99999           |
| capital-loss   |    6636 |      nan | nan                |    nan |    104.375  |    435.393   |     0 |      0 |      0 |      0 |  2824           |
| hours-per-week |    6636 |      nan | nan                |    nan |     42.175  |     11.8396  |     1 |     40 |     40 |     48 |    99           |
| native-country |    6636 |       40 | United-States      |   5950 |    nan      |    nan       |   nan |    nan |    nan |    nan |   nan           |
| num-education  |    6636 |      nan | nan                |    nan |     10.1231 |      2.64875 |     1 |      9 |     10 |     13 |    16           |

# female segment

print(female_explainer.get_xs().describe(include="all").T.to_markdown())

|                |   count |   unique | top           |   freq |        mean |          std |   min |    25% |    50% |    75% |             max |
|:---------------|--------:|---------:|:--------------|-------:|------------:|-------------:|------:|-------:|-------:|-------:|----------------:|
| age            |    3364 |      nan | nan           |    nan |     37.1787 |     14.1917  |    17 |     25 |     35 |     47 |    90           |
| workclass      |    3364 |        8 | Private       |   2412 |    nan      |    nan       |   nan |    nan |    nan |    nan |   nan           |
| fnlwgt         |    3364 |      nan | nan           |    nan | 189207      | 105481       | 19395 | 116038 | 178192 | 236833 |     1.16136e+06 |
| education      |    3364 |       16 | HS-grad       |   1041 |    nan      |    nan       |   nan |    nan |    nan |    nan |   nan           |
| marital-status |    3364 |        7 | Never-married |   1495 |    nan      |    nan       |   nan |    nan |    nan |    nan |   nan           |
| occupation     |    3364 |       14 | Adm-clerical  |    773 |    nan      |    nan       |   nan |    nan |    nan |    nan |   nan           |
| relationship   |    3364 |        5 | Not-in-family |   1248 |    nan      |    nan       |   nan |    nan |    nan |    nan |   nan           |
| race           |    3364 |        5 | White         |   2696 |    nan      |    nan       |   nan |    nan |    nan |    nan |   nan           |
| capital-gain   |    3364 |      nan | nan           |    nan |    587.507  |   5017.54    |     0 |      0 |      0 |      0 | 99999           |
| capital-loss   |    3364 |      nan | nan           |    nan |     56.0541 |    330.777   |     0 |      0 |      0 |      0 |  4356           |
| hours-per-week |    3364 |      nan | nan           |    nan |     36.2616 |     12.0074  |     1 |     30 |     40 |     40 |    99           |
| native-country |    3364 |       39 | United-States |   3003 |    nan      |    nan       |   nan |    nan |    nan |    nan |   nan           |
| num-education  |    3364 |      nan | nan           |    nan |     10.0517 |      2.40248 |     1 |      9 |     10 |     12 |    16           |

One notable difference here between men and women appears to be the marital-status feature. Perhaps this could help explain the discrepancy? Let's take a look.

male_explainer.get_xs()["marital-status"].value_counts()

Married-civ-spouse       3948
Never-married            1852
Divorced                  590
Separated                 141
Widowed                    52
Married-spouse-absent      50
Married-AF-spouse           3
Name: marital-status, dtype: int64

female_explainer.get_xs()["marital-status"].value_counts()

Never-married            1495
Divorced                  860
Married-civ-spouse        478
Widowed                   261
Separated                 198
Married-spouse-absent      68
Married-AF-spouse           4
Name: marital-status, dtype: int64

There appears to be a lot more men that are married than women. This might explain things, but it's unclear if this is important. Let's take a look at the influence sensitivity plots (ISPs).

explainer.plot_isp("marital-status")

Overall being married seems to cause the model to highly assume you make over 50k (the "one" class) and this may help explain the male/female discrepency. More investigation is needed!