Python SDK Tutorial: Fairness Analysis¶
Fairness analyses give insight into whether your model is fair (using a metric and population that makes sense), and how to debug biases when they arise. This notebook covers how one might use TruEra's Python SDK to analyze and mitigate an unfairness present within a model.
Before you begin¶
- Install the TruEra Python SDK
- Complete the Diagnostics Quickstart
- Check our primer on Explainer objects
What we'll cover ☑️¶
- In this tutorial, we'll use a project and model already uploaded to our demo server, which is the
Adult_Census
project that predicts whether an individual's income is above a certain threshold based on some demographic factors. This is a classic dataset used in fairness analyses. We also provide a version of the Census dataset with our Quickstart data, which you can get from your deployment's Downloads page. - We'll evaluate the fairness of our model using one metric.
- We'll then dig into the primary driver of the unfairness to understand why the model is exhibiting a bias.
To ingest the Adult_Census
project yourself, follow the instructions in the Diagnostics Quickstart.
Step 1: Connect to TruEra endpoint¶
What do I need to connect?¶
- TruEra deployment URI (the connection string).
- Some form of authentication (basic auth or token auth).
For most users, the TruEra URI will take the form http://
For examples on how to authenticate, see the Authentication section of the Diagnostics Quickstart. Here, we will use basic authentication with a username and password.
CONNECTION_STRING = "<CONNECTION_STRING>"
USERNAME = "<USERNAME>"
PASSWORD = "<PASSWORD>"
from truera.client.truera_workspace import TrueraWorkspace
from truera.client.truera_authentication import BasicAuthentication
auth = BasicAuthentication(USERNAME, PASSWORD)
tru = TrueraWorkspace(CONNECTION_STRING, auth)
Step 2: Connect to the example Adult_Census
example project¶
tru.set_project("Adult_Census")
tru.set_model("V1 - all") # this implicitly sets the data collection as well
Step 3: Evaluate the fairness of the model for men vs. women¶
Let's first get a quick summary to see if there's enough bias between men and women in the model to be concerned. You can see here that we are computing the fairness on a predefined segment group that contains individual segments for both Males and Females. We can also provide a fairness metric (or fallback to the configured fairness metric defaults).
In this case we use the DISPARATE_IMPACT_RATIO
, which measures the ratio of the selection rate between men and women, but you can browse through our supported metrics to find a fairness metric that is appropriate for your situation.
For guidance on creating these segments for your own project, see Segments.
explainer = tru.get_explainer(base_data_split="adult-base-dataset-split-all")
bias_result = explainer.compute_fairness(
segment_group="adult-segment-gender",
segment1="Male",
segment2="Female",
fairness_type="DISPARATE_IMPACT_RATIO",
)
bias_result
What information is in a BiasResult
object?¶
segment1_name
andsegment2_name
refer to the segment names we used in the analysis (Male
andFemale
).- The
aggregate_metric
(disparate impact ratio, in this case) is 3.4. - The individual selection rates for segments 1 and 2 used to calculate the
aggregate_metric
is shown undersegment1_metric
andsegment2_metric
. As our metric of choice is a ratio, theaggregate_metric
is equal tosegment1_metric / segment2_metric
. - The
favored_segment
shows which segment is advantaged by the model, based on how to interpret the model score.
This does seem somewhat concerning-- the Male
segment is favored, and the disparate impact ratio (aggregate_metric
) is heavily skewed, given the ideal value is 1. To look into further let's examine the distribution of the Male
and Female
segments. We can do this by using the Explainer.set_segment()
method.
male_explainer = tru.get_explainer(base_data_split="adult-base-dataset-split-all")
male_explainer.set_segment("adult-segment-gender", segment_name="Male")
female_explainer = tru.get_explainer(base_data_split="adult-base-dataset-split-all")
female_explainer.set_segment("adult-segment-gender", segment_name="Female")
# male segment
print(male_explainer.get_xs().describe(include="all").T.to_markdown())
# female segment
print(female_explainer.get_xs().describe(include="all").T.to_markdown())
One notable difference here between men and women appears to be the marital-status
feature. Perhaps this could help explain the discrepancy? Let's take a look.
male_explainer.get_xs()["marital-status"].value_counts()
female_explainer.get_xs()["marital-status"].value_counts()
There appears to be a lot more men that are married than women. This might explain things, but it's unclear if this is important. Let's take a look at the influence sensitivity plots (ISPs).
explainer.plot_isp("marital-status")
Overall being married seems to cause the model to highly assume you make over 50k (the "one" class) and this may help explain the male/female discrepency. More investigation is needed!