Skip to content

Exploring Dashboard Panels

As discussed in Creating a New Dashboard, panels are organized into distinct categories, compromising:

Each is next discussed in the context of Model Type — Regression, Classification, or Common (applied to both regression and classification analyses).

Regression Model Panels

Regression models describe the relationship between one or more independent variables and a target variable.

A brief description and example of each regression model panel by panel type is covered next.

Tip

You can inspect the selected model for a panel in TruEra Diagnostics by clicking DIAGNOSTICS ↗ at the top-right of the panel.

By clicking , you can export the panel's content to a CSV file for download and import into a spreadsheet or other third-party application.

Model Output

Mean

Tracks the average model score for each model over time.

Model Output: Mean
click and hold to enlarge

Volume

Tracks the volume of output for each model over time.

Model Output: Volume
click and hold to enlarge

Predictions and Labels

Tracks model results against the respective ground truth label, when available, over time.

Distribution

Tracks the distribution of scores for each model and for the labels for those models over the user-defined time range.

Model Output: Distributions
click and hold to enlarge

Drift: Difference of Means

Tracks the absolute difference between means (production vs. baseline or known good time window) for each model over time.

Model Output Drift: Difference of Means
click and hold to enlarge

Drift: Wasserstein

Tracks the
EMD

Earth Mover's Distance

Minimum amount of work (distance) to match two distributions, normalized by the total weight of the lighter distribution
(production vs. baseline) for each model over time.
Model Output Drift: Difference of Means
click and hold to enlarge

Labels

Labels, when ingested and available, indicate the ground truth or some other meaningful measure to track your model's results against.

Label Volume

Tracks the volume of ground truth labels over time.

Label Volume
click and hold to enlarge

Label Distribution

Reports the percentage of records over time with ground truth labels assigned.

Label Drift

Calculates the distance between the baseline (training split or known good production time window) and the selected time window using the specified distance metric.

Also known as "annotation drift" and "target drift," label drift is a problem that occurs when the labels or categories associated with a dataset change over time. This can happen for a variety of reasons ranging from changes in human judgment to the introduction of new categories to the merging or splitting of existing categories. It can also be caused by the target population distributions changing over time.

Model Performance

Regression model performance — good or bad, acceptable vs. unacceptable — is reflected in the error rate of the model's predictions. Knowing how well the regression line fits the dataset is another indicator of performance.

A "good" regression model is one for which the difference between the actual or observed values and predicted values for the data introduced is small and unbiased. Hence, to determine "acceptable" performance, the important questions are: Which errors does the model make? Are there specific data segments where the model performs differently? Did anything change compared to training?

Should these and other potential issues arise in production, TruEra Monitoring gives you the option of capturing a subset of your production data time window for RCA in TruEra Diagnostics.

RMSE

Root Mean Square Error (RMSE), also known as root mean square deviation, is a commonly used regression metric for evaluating the quality of predictions. It measures the Euclidean distance between the prediction and the ground truth, providing an estimation of how accurately the model is able to predict the target value.

Given that MSE values can be too big for an easy comparison, the square root brings it back to the same level of prediction error, making RMSE easier to interpret.

root mean square error plot
click and hold to enlarge

Because RMSE is not scale invariant, however, model comparisons using this measure can be affected by the scale of the data, so it’s generally wise to apply RMSE over standardized data only.

RMSE is helpful when you need a single number with which to judge a model’s performance — during training and cross-validation and for monitoring a production deployment. Keep in mind that squaring numbers and calculating the mean can be heavily affected by a few predictions that deviate from the rest; i.e., outliers that signficantly distort overall model output.

WMAPE

Sometimes abbreviated as wMAPE — weighted mean absolute percentage error — WMAPE is another metric for evaluating the performance of regression models. A variant of MAPE, absolute percentage errors are weighted by volume for a more rigorous and reliable metric.

weighted mean absolute percentage error plot
click and hold to enlarge

WMAPE is typically used to investigate the average error of your model predictions over time compared to what really happens.

Classification Model Panels

Classification models predict a category, classifying a data point into a specific category/class — Yes/No, Spam/Not spam, Eligible/Ineligible, Qualified/Unqualified, etc. — and output a model score, often a probability within the range of [0,1]), after which a decision threshold is applied to yield a decision {0, 1}, generally mapped to classes: {fraud, not fraud}, {spam, not spam}, and so forth. Certain metrics look at the raw model score, while others look at the decision; i.e., after the threshold is applied.

In other words, certain panels, such as those for drift, display model scores (
probit
unit of probability based on deviation from the mean of a standard distribution
or
logit
quantile function associated with the standard logistic distribution
). Others display the decision after the threshold is applied, represented numerically as a 0 or 1.

For example, a classification model for a lender might be designed to predict whether a customer is likely to default on a loan based on data contained in a the customer's credit report/payment history when compared to the track record of other borrowers. Or, the prediction could be influenced by factors other than credit score — like education, income, length of current employment, time living at the same address, age, marital status, number of dependents, and so forth.

The simplest metric for model evaluation is performance accuracy — the ratio of the number of correct predictions to the total number of predictions made for a given dataset aggregated over time.

A brief description and example of each classification model panel by panel type is covered next.

Remember, You can inspect the selected model for a panel in TruEra Diagnostics by clicking DIAGNOSTICS ↗ at the top-right of the panel.

Also, by clicking , you can export the panel's content to a CSV file for download and import into a spreadsheet or other third-party application.

Model Output

The following panels tracking model output can be configured (see Creating a Dashboard).

Mean

Tracks the average model score for each model over time.

Model Output: Mean
click and hold to enlarge

Volume

Tracks the volume of output for each model over time.

Model Output: Volume
click and hold to enlarge

Drift: Difference of Means

Tracks the absolute difference between means (production vs. baseline or known good time window) for each model over time.

Model Output Drift: Difference of Means
click and hold to enlarge

Drift: Wasserstein

Tracks the
EMD

Earth Mover's Distance

Minimum amount of work (distance) to match two distributions, normalized by the total weight of the lighter distribution
(production vs. baseline) for each model over time.
Model Output Drift: Difference of Means
click and hold to enlarge

Distribution

Tracks the distribution of scores for each model and for the labels for those models over time.

Model Output: Distributions
click and hold to enlarge

Model Decisions and Labels By Class

Tracks the distribution of model decisions (post-decision threshold) for all models, and the distribution of labels for each model.

Model Output: Decisions and Labels By Class
click and hold to enlarge

Class Distribution

Tracks the ercentage of model decisions assigned to the target class.

Model Output: Class Distribution
click and hold to enlarge

Model Score Distribution

The Model Score Distribution panel compares labels and model output to check model accuracy along the distribution — Min (minimum), 5th Pctl (percentile), 25th Pctl, Median (mean), 75th Pctl, 95th Pctl, and Max (maximum). For classification, this means tracking the percentage of model decisions assigned to the target class.

Model Output: Model Score Distribution
click and hold to enlarge

sample bell curve Following the empirical rule — i.e., all data in a normal distribution will fall within three standard deviations of the mean (median) — percentiles express the percentage of scores higher than the rest of the population, conveying that data near the mean occur more frequently than data far from the mean. Graphically, this results in a bell-shaped curve, the precise shape of which can vary according to the distribution of the values within the population.

The population is the entire set of data points included in the distribution. The 5th Pctl reflects values ranking higher than 5% and lower than 95% of the population. Conversely, the 95th Pctl reflects scores ranking higher than 95% and lower than 5% of the distribution. The Median tallies scores that are higher than half of the population and lower than the other half, and so forth — for the 25th Pctl (higher than 25% of the population, lower than 75%) and the 75th Pctl (higher than 75% of the population, lower than 25%).

When a model score distribution bears closer investigation, you can inspect a listed model in TruEra Diagnostics by clicking DIAGNOSTICS ↗ at the top-right of the panel.

Labels

Labels, when ingested and available, indicate the ground truth or some other meaningful measure to track your model's results against.

Label Volume

Tracks the volume of labels over time.

Label Volume
click and hold to enlarge

Label Class Distributions

Reports the percentage of records over time with ground truth labels assigned to the target class.

Label Class Distributions
click and hold to enlarge

Model Performance

Performance tracking for classification models currently supports AUC measurements. Additional metrics for classification monitoring are road-mapped for support soon.

AUC

sample AUCClassification model performance is commonly visualized in the area under the curve (AUC), which represents the degree or measure of separability between classes. In other words, the higher the AUC, the better the model is at predicting that a data point properly belonging to the 0 class is classified as 0 and a point belonging to the 1 class is classified as 1.

For instance, a high AUC for a model that classifies whether or not a patient has a certain medical condition is better at distinguishing between "has" and "doesn't have" than a different model running the same data with a lower AUC.

model performance AUC
click and hold to enlarge

Monitoring production model AUC values over time can reveal performance trends that may bear closer scrutiny with respect to the data, the model or both.

Important

Aggregated accuracy from label and model scores are computed over the specified time range before the classification decision. Because labels may be offset from model output, always check that labels exist in the selected time range.

A model's AUC score consistently trending lower may merit closer investigation. Inspect a listed model in TruEra Diagnostics by clicking DIAGNOSTICS ↗ at the top-right of the panel.

Likewise, by clicking , you can export the panel's content to a CSV file for download and import into a spreadsheet or other third-party application.

Common Panels

These panel categories are shared by both regression and classification models, although there are some differences in panel visualizations.

Data Input

The following panels tracking model input can be configured.

Input Volume

Tracks the average input volume for each model over time.

Input Volume
click and hold to enlarge

Data Drift: Difference of Means

Tracks drift statistics for each feature as calculated by the distance between the baseline (training split or known good prod time window) and the prod time window using Difference of Means Distance).

Data Drift: Difference of Means
click and hold to enlarge

Data Drift: Wasserstein

Drift statistics for each feature as calculated by the distance between the baseline (training split or known good prod time window) and the prod time window using Wasserstein Earthmover’s Distance).

Data Drift: Wasserstein
click and hold to enlarge

Out of Range Values

A count of Out of Range errors by numerical feature and per model. A prod numerical feature is considered “Out of Range” if the value lies outside the [min, max] range as observed in the baseline.

Out of Range Values
click and hold to enlarge

Remember, to more closely inspect one of the listed models in TruEra Diagnostics, click DIAGNOSTICS ↗ at the top-right of the panel.

Click to export the panel's content to a CSV file for download and import into a spreadsheet or other third-party application.

Data Quality

Data Quality, or DQ for short, measures the condition of data processed by a model based on factors of accuracy, completeness, and consistency.

The following panels track the aspect indicated.

Unrecognized Categories

Tracks the count of unrecognized categorical errors by categorical feature per model. A production categorical feature is considered to have this error when a prod value is one that was not observed in the baseline.

DQ - Unrecognized Categories
click and hold to enlarge

Numerical Issues

The Numerical Issues panel counts the instances in which numerical features exhibit NAN, NULL, Inf, or -Inf for the time range in question.

Numerical Issues
click and hold to enlarge

Schema Mismatch

Schema mismatch occurs when the data source has different column (feature) names than those for which the model is configured. In other words, the model cannot match an input value to the data type and/or range it expects for the value — that is, the model and data are out-of-sync.

This panel tracks the count of schema mismatch errors across all features and per model.

Schema Mismatches
click and hold to enlarge

Missing Values

Errors tracked due to the absence of expected records during model processing of ingested data.

Missing Values
click and hold to enlarge

DQ Exploration

Tracks data quality errors per feature per model across the time range specified by the panel’s time selector.

Data quality exploration table
click and hold to enlarge

Custom Panels

These are panels you configure to monitor particular segments or to track and report your own user-defined metrics.

Segment Performance

These custom panels report on the segment(s) you configure for the selected model and time range.

Data quality exploration table
click and hold to enlarge

Click Next below to continue.