Root Cause Analysis
A range of approaches, tools, and techniques used to uncover the root cause of a problem, identifying those factors that set in motion the entire cause-and-effect reaction that ultimately led to the problem(s).Ingesting Custom Metrics¶
TruEra Monitoring lets you ingest custom metrics of different types for both technical and business monitoring.
The three classes of custom metrics (with examples) are:
- A general metric not tied to any specific model or record
- A model metric tied to a specific model.
- A record metric tied to a specific model with a record id.
Each is described more fully below, along with the respective ingestion guidance.
General Metrics¶
A general metric is any business metric you want to track on your monitoring dashboards that is not explicitly tied to one model — either overall performance on a task being solved/handled by a suite of models working together or performance on a task each model in the suite is being A/B tested to solve/handle independently.
For instance, assume a suite of predictive maintenance models, each for a different machine or vehicle. A general metric could be used to aggregate maintenance and repair costs over time for the entire suite. Similarly, for a suite of fraud prevention models, you could use a general metric to measure fraud losses due to false negatives over time and another for lost sales due to false positives.
In the case of the latter — a suite of fraud prevention models — ingested data for a general metric tracking aggregated maintenance costs over time might look like this:
Time | Metric Name | Sum |
---|---|---|
T1 | maintinance_cost | 100.0 |
T2 | maintinance_cost | 200.0 |
T3 | maintinance_cost | 3000.0 |
T4 | maintinance_cost | 40.0 |
T5 | maintinance_cost | 500.0 |
... | ... | ... |
To create/ingest a general metric, two steps are required:
- Sending data for the metric.
- Specifying the type of aggregation. This is done when creating the dashboard widgit.
The sequence is arbitrary.
There are two steps to creating a point metric: sending data for the metric and specifying the type of aggregation. These two steps can be performed in either order.
Step 1. Send the metric data
TruEra's Python SDK provides helpers for reporting general metrics. Applying the following as a guide, choose a method, replacing the examples values with your values.
from truera.client.metrics import getGeneralMetricReporter
# Create a general metric reporter:
generalMetricReporter = getGeneralMetricReporter(tru)
# Send a point with a metric name and time to be used for aggregation
a_few_minutes_ago =
datetime.datetime.utcnow() - datetime.timedelta(minutes=15)
generalMetricReporter.send("myMetric", 11.0, time=a_few_minutes_ago)
# If time is omitted, it is inferred to be the current time
generalMetricReporter.send("myMetric", 11.0)
# Multiple metrics can also be sent together as a map
metrics = {"myMetric1": 11.0, "myMetric2": 12.0}
generalMetricReporter.sendMetrics(metrics)
Notes
- The numeric component of a general metric is a 64-bit floating point number.
General metrics can also be sent as raw REST requests.
curl --location --request \
POST 'https://<TruEra deployment>/api/v0/ingest/streaming/metric' \
--header <auth header> \
--header 'Content-Type: text/plain' \
--data-raw '{
"project_id": "<project_id>",
"timestamp": "<timestamp in RFC 3339 format>",
"metrics": {
"metric1": 11.0,
"metric2": 12.0,
}
}'
Step 2. Specify metric aggregation type
The raw numerical metric data can be aggregated in the following ways:
- Count
- Sum
- Average
- Max
- Min
Model metrics¶
Model metrics can be any business metric tied to a given model. These allow for the creation of dashboards to track each model's performance against the defined KPI model metric.
For example, if you have a suite of predictive maintenance models for different machines or vehicles, there could be a model metric covering the maintinance and repair costs based on predictions that each model has made over time. Similarly, assume you have a suite of fraud prevention models. You could have a model metric for the fraud losses from false negatives and lost sales for each model's predictions.
The table below shows what the ingested data could look like for a model metric on a suite of predictive maintenance models. You'll note that it includes Model Id for direct correlation with a specific model. The sum of costs can then be aggregated and tracked by model, leading to an assessment of which model's predictions have the most/least impact on the business.
Time | Metric Name | Model Id | Sum |
---|---|---|---|
T1 | maintinance_cost | M1 | 100.0 |
T2 | maintinance_cost | M1 | 200.0 |
T3 | maintinance_cost | M2 | 3000.0 |
T4 | maintinance_cost | M1 | 40.0 |
T5 | maintinance_cost | M2 | 500.0 |
... | ... | ... | ... |
Even if models M1 and M2 have the same accuracy, the sum over time may lead you to conclude that M2's mistakes are costlier than M1's — a difference worth investigating.
To create/ingest a model metric, two steps are required:
- Sending data for the metric.
- Specifying the type of aggregation. This is done when creating the dashboard widgit.
The sequence is arbitrary.
There are two steps to creating a model metric: sending data for the metric and specifying the type of aggregation. These two steps can be performed in either order.
Step 1. Send the metric data
TruEra's Python SDK provides helpers for reporting model metrics. Applying the following as a guide, choose a method, replacing the examples values with your values.
from truera.client.metrics import getModelMetricReporter
# Create a model metric reporter
# Note: the model metric reporter will use the model in tru's current context.
modelMetricReporter = getModelMetricReporter(tru)
# Send a point, specifying a metric name and time to be used for aggregation
a_few_minutes_ago = datetime.datetime.utcnow() - datetime.timedelta(minutes=15)
modelMetricReporter.send("myMetric", 11.0, time=a_few_minutes_ago)
# The time can be omitted and inferred as the current time
modelMetricReporter.send("myMetric", 11.0)
# Multiple metrics can also be sent together as a map
metrics = {"myMetric1": 11.0, "myMetric2": 12.0}
modelMetricReporter.sendMetrics(metrics)
Notes
- The numeric component of a model metric is a 64-bit floating point number.
Model metrics can also be sent as raw REST requests:
curl --location --request POST 'https://<TruEra deployment>/api/v0/ingest/streaming/metric' \
--header <auth header> \
--header 'Content-Type: text/plain' \
--data-raw '{
"project_id": "<project_id>",
"model_id": "<model_id>",
"timestamp": "<timestamp in RFC 3339 format>",
"metrics": {
"metric1": 11.0,
"metric2": 12.0,
}
}'
Step 2. Specify metric aggregation type
The raw numerical metric data can be aggregated in the following ways:
- Count
- Sum
- Average
- Max
- Min
Record Metrics¶
Here are a few examples:
- Any business KPI for a record (customer lifetime value, fraud loss, cost of acquiring this customer, etc.)
- Any analytical metric at the record level — expected loss (probability of loss times value), expected value (probability of each outcome times value of each outcome
- MLOpsmetrics – latency of prediction call, number of retries, etc.
Machine Learning Operations
AI/ML engineering discipline focused on streamlining the process of taking machine learning models to production, and then maintaining and monitoring them.
The table below shows what the ingested data could look like for a point metric on a suite of predictive maintenance models. You'll note that it includes Model Id for direct correlation with a specific model as well as a Point Id for correlation with a given point. The sum of costs can then be aggregated and tracked by model, leading to an assessment of which model's predictions have the most/least impact on the business. The Point Id can later be used to correlate with a given prediction the model made.
Time | Metric Name | Model Id | Point Id | Sum |
---|---|---|---|---|
T1 | maintinance_cost | M1 | point_M1_T1 | 100.0 |
T2 | maintinance_cost | M1 | point_M1_T2 | 200.0 |
T3 | maintinance_cost | M2 | point_M2_T3 | 3000.0 |
T4 | maintinance_cost | M1 | point_M1_T4 | 40.0 |
T5 | maintinance_cost | M2 | point_M2_T5 | 500.0 |
... | ... | ... | ... | ... |
To create/ingest a point metric, two steps are required:
- Sending data for the metric.
- Specifying the type of aggregation. This is done when creating the dashboard widgit.
The sequence is arbitrary.
The widget can be created or underlying data can be sent in either order.
Step 1. Send the metric data
TruEra's Python SDK provides helpers for reporting point metrics. Applying the following as a guide, choose a method, replacing the examples values with your values.
from truera.client.metrics import getPointMetricReporter
# Create a point metric reporter:
pointMetricReporter = getPointMetricReporter(tru)
point_id = <id value of prediction data point>
# Send a point with a metric name and time to be used for aggregation
a_few_minutes_ago =
datetime.datetime.utcnow() - datetime.timedelta(minutes=15)
pointMetricReporter.send(point_id, "myMetric", 11.0, time=a_few_minutes_ago)
# If time is omitted, it is inferred to be the current time
pointMetricReporter.send(point_id, "myMetric", 11.0)
# Multiple metrics can also be sent together as a map
metrics = {"myMetric1": 11.0, "myMetric2": 12.0}
pointMetricReporter.sendMetrics(point_id, metrics)
Notes
- The numeric component of a point metric is a 64-bit floating point number.
Point metrics can also be sent as raw REST requests.
curl --location --request \
POST 'https://<TruEra deployment>/api/v0/ingest/streaming/metric' \
--header <auth header> \
--header 'Content-Type: text/plain' \
--data-raw '{
"project_id": "<project_id>",
"point_id": "<point_id>",
"timestamp": "<timestamp in RFC 3339 format>",
"metrics": {
"metric1": 11.0,
"metric2": 12.0,
}
}'
Step 2. Specify metric aggregation type
The raw numerical metric data can be aggregated in the following ways:
- Count
- Sum
- Average
- Max
- Min
Click Next below to continue.