Understanding Quantitative Input Influence (QII)

The features in a data collection directly influence the predictive performance of the models using the data. Understanding these influences aids in explaining a model's predictions. But how is influence calculated? How is it scored?

TruEra's quantitative input influence (QII) values are unbiased estimates of Shapley values that quantify the contribution (influence) of individual features in making a model’s decision on a given datapoint.

To learn how QII is applied and the computational parameters involved, the following topics will be helpful:

For a given model, the marginal contribution of a feature is calculated by:

analyzing the difference in the output by including and excluding the feature
averaging over all N! possible orderings and with all subsets of remaining features

What is QII all about?¶

QII calculates the influence/impact of a group of features together. Then, using the cooperative game theory concept much like SHAP, it calculates the marginal contribution of a feature to a model score. To measure the individual contribution of a feature to model outputs, QII logically breaks the correlation between features.

As a practical matter, consider a loan prediction model that uses input features like income, debt, and age to determine whether an individual should receive a loan. We might use QII values to explain the extent to which Jane’s income contributed to her loan rejection. Likewise, aggregating QII values across a dataset can give high-level model insights, such as what features are most important in the global decision-making process. Here, we might see that, on average, a person’s debt is most influential in a model’s final decision.

Why use QIIs as opposed to other methods?¶

QIIs (and Shapley values in general) carefully account for correlated inputs while measuring influence, satisfying a variety of desirable properties that make them well-suited for generating model explanations.

For example, given a model with features \(f_1 \cdots f_k\) and a set of "background" or comparison points, we can compute QII values \(q_1 \cdots q_k\) per feature such that:

For any given datapoint, the sum of the influences \(\sum_{i}q_i\) must equal the model prediction on that datapoint minus the mean model prediction on the comparison points.
\(q_i = 0\) if the model does not use feature \(i\).
\(q_i = q_j\) if the model treats features \(i\) and \(j\) identically.
If the value of \(f_i\) changes such that, regardless of other features, the model prediction increases, then \(q_i\) must also increase.
If the model prediction is composed of a linear combination of two intermediate predictions \(\alpha\) and \(\beta\), then \(q_i\) for the final model must be the same weighted combination of the QII values measured with respect to \(\alpha\) and \(\beta\).

QIIs carefully account for correlated inputs while measuring influence. Generating them requires access to the system without the need for code analysis or other inner workings, although some knowledge of the input dataset initially used to train the ML model is required.

Why don't influence values sum to the model prediction?¶

We can only explain a model's predictions in the context of other points. Take, for example, a model that always outputs a fixed value of 42, no matter the input. Here, the QII values for any point will be zero because the model score is always a static 42. This makes it plainly impossible for the QIIs to sum to the model score. Comparatively, however, each feature would have zero influence against any background point, as well. The QIIs would therefore sum to the model prediction minus the mean model prediction on the comparison points.

Why are feature influences expensive to compute?¶

Provably unavoidable, QII values require an exponential number of computations.

How can I speed up my QII computations?¶

Improve your QII computation speed with either of these two methods:

For tree-based and linear models, take advantage of the inherent structure of the model and its linearity to speed up QII computations. See Ingesting Python Models to verify whether your model type supports this optimization.
Modify the QII parameters that affect the speed of the computation.

Click Next below to continue.