Interesting/HighError Segments: Finding Hotspots¶
Useful for identifying and isolating interesting and/or higherror data points in order to take targeted, corrective measures during model performance debugging is an analytical technique called Segmentation. It entails dividing and organizing (segmenting) your data into defined groups having one or more characteristics in common to find points at variance with overall model output.
However, manually sifting through a large or small data split for points that can be construed as "interesting" or "higherror" is a timeconsuming and often cumbersome task requiring a lot of choices along the way, starting with what constitutes "interesting" and how high is a "high" error rate? In other words, what is your segmentation criteria? Which features should be included that share a value or fall within a particular range?
The TruEra Python SDK's find_hotspots()
method simplifies and automates precisely this type of manual exploration.
How does it work?
Given userspecified parameters, find_hotspots()
searches features or sets of features in a greedy fashion to return segments that maximize or minimize the "metric of interest" you specify.
Note
Here, the term "hotspots" is interchangeable with "interesting/high error segments.
Pointwise Metric Calculation¶
To calculate a given metric on various segments in a split, it's necessary to establish a pointwise metric over which to aggregate. This could be, for example:

Pointwise classification accuracy – assign the value of 1 to correctly classified points; assign all other points a value of 0.

Pointwise squared error – assign the squared difference between its prediction and its ground truth to each point.

Pointwise precision – two lists are maintained, numerator and denominator; points belonging to either list are assigned a value of 1; all other points receive a value of 0.
Pointwise metrics can then be aggregated for various segments, allowing a comparison across different segments in order to return the most “interesting” ones; where “interesting” correlates to higher or lower depending on the metric of interest.
Pointwise Metric Aggregation: Simple Mean¶
Aggregation of pointwise metrics depends on the metric of interest. For aggregated metrics expressed as the mean of the pointwise metrics, TruEra uses the approach cited next.
Denoting a set of model predictions on a segment as S of size │S │ = N_{segment} and P(x_{1}) as the pointwise metric for the ith point in the segment, the aggregated metric for a segment can be given as:
where s is the 'size exponent' factor to help scale M_{segment} by the size of the segment.
If a comparison_data_split_name
is provided, here's the mean aggregation used:
Denoting split A and split B as the base and comparison splits, respectively, then denoting the size of the segment in these splits as N_{A} and N_{B}, the aggregated mean metric can be given as
Pointwise Metric Aggregation: Confusion Matrix¶
With respect to metrics derived from the confusion matrix (e.g., precision, recall, true/false positive/negative rate), the aggregation method must take into account the numerator and the denominator of the metric of interest. For instance, if the definition of precision is
then the pointwise metrics denote membership in the numerator (e.g., TP for Precision) and the denominator (e.g., TP + FP for Precision) of the corresponding confusion matrix metric. Therefore, the aggregated metric for the given segment is derived by
Denoting a set of model predictions on a segment as S of size │S │ = N_{segment} and functions n(x_{i}) (d(x_{i})), which indicate numerator (denominator) membership of the ith point in the segment, the aggregated metric for a segment can then be given as
If a comparison_data_split_name
is provided, then the following numerator/denominator aggregation is used.
Denoting the size of the segment in the split as N_{A} and N_{B}, the aggregated mean metric can be given as:
Parameters¶
Although the find_hotspots()
method definition in the Python SDK Technical Reference provides the full list of parameters, here is some additional context:

size_exponent
– float in range [0,1] which encourages the method to return smaller segments. Looking at the equation above, we see thatsize_exponent = 0
results in the mean value over the segment whilesize_exponent = 1
results in the sum of the pointwise metrics in the segment. 
comparison_data_split_name
– defaults toNone
; required formetric_of_interest = UNDER_OR_OVER_SAMPLING
; optional for any other metric of interest. 
metric_of_interest
– name of metric to use when searching for hotspots. Allowable values for different project types are listed in the tables that follow for classification and regression projects, respectively.
Classification Metrics of Interest¶
Metric  Description  Notes 

SEGMENT_GENERALIZED_AUC 
common thresholdindependent metric  default 
CLASSIFICATION_ACCURACY 
common classification metric  
LOG_LOSS 
thresholdindependent metric  
PRECISION 
confusionmatrix derivative metric  
RECALL 
confusionmatrix derivative metric  
TRUE_POSITIVE_RATE 
confusionmatrix derivative metric  
FALSE_POSITIVE_RATE 
confusionmatrix derivative metric  
TRUE_NEGATIVE_RATE 
confusionmatrix derivative metric  
FALSE_NEGATIVE_RATE 
confusionmatrix derivative metric  
UNDER_OR_OVERSAMPLING 
modelagnostic, datadependent  experimental^{*} 
Regression Metrics of Interest¶
Metric  Description  Notes 

MEAN_ABSOLUTE_ERROR 
common regression metric  default 
MEAN_SQUARED_ERROR 
common regression metric  
MEAN_SQUARED_LOG_ERROR 
useful when range of target is large  
UNDER_OR_OVERSAMPLING 
modelagnostic, datadependent  experimental^{*} 
Under/Oversampling Metric of Interest (Experimental)¶
HIGH_UNDER_OR_OVERSAMPLING
is a datadependent metric of interest TruEra exposes to search for interesting segments based on the percentage difference between segment sizes in two data splits. Please note that this metric requires a comparison split in addition to the explainer’s base split.
An under/oversampled segment will have a high size diff (%) value, which is the absolute difference between segment sizes as a percentage of split sizes formally defined as:
where A and B denote two splits to compare, N_{segment} denotes the number of points in the segment, and N_{split} denotes the number of points in the split.
Hence, the output of find_hotspots()
when metric_of_interest=UNDER_OR_OVERSAMPLING
will look similar to this:
Important
The find_hotspots()
method requires that a model is defined on the explainer, even though calculating UNDER_OR_OVERSAMPLING
does not require a model.
"What If" Metric (Experimental)¶
At one point or another, you'll undoubtedly ask yourself whether it's worth your time to take action on these "interesting" segments. To help you assess a proposed segment's actionability, TruEra denotes a ‘what if’ metric as:
where M denotes the metric on a set of points and N denotes the number of points.
An abstract of the "what if" metric calculation looks like this:
When requested using the show_what_if_performance
parameter, the 'what if' metric is returned in addition to each segmentwise metric (pictured next).
Keep in mind that the ‘what if’ metric can only be defined for a viable metric_of_interest
expressed as the average of a linear combination of pointwise metrics (e.g., classification accuracy, mean squared error). This means that certain metrics of interest (e.g., AUC, precision) will not return a ‘what if’ metric even if requested, as shown in the next example.
Here's the current list of supported "what if" metrics:
Classification
 Classification Accuracy
 Log Loss
Regression
 Mean Absolute Error
 Mean Squared Error
 Mean Squared Log Error
See Performance Metrics for definitions.
Important Caveats Regarding Web App Support
With release of TruEra v1.33, the TruEra Web App support for find_hotspots()
exposes these parameters only:
num_features
minimum_size
metric_of_interest
Also, because the Web App's find_hotspots()
workflow does not currently use a comparison split, the experimental metric of interest UNDER_OR_OVERSAMPLING
is not enabled.
Click Next below to continue.