Comment on page

Model Performance Metrics


Models trained with the latest version of SmartML (v. 3.10.1 and higher) will now display performance metrics to help you evaluate your model's performance.
The "Metrics" tab (available for most model types) provides a break down of the values used to calculate model performance. These can be filtered by class.
Precision, Recall, and Accuracy metrics for each class can be found in the "Metrics" tab.
Different metrics will be available depending on your model type:

Object Detection and Instance Segmentation

Semantic Segmentation



Performance Metrics

Loss and Loss Over Time

Loss is a value that measures model error, indicating how poor a model's predictions were for a single data point.
The Loss values for the model's Test and Validation sets are listed near the model type. This represents the average Loss across all classes for that set.
The Loss Over Time graph plots how poorly a model performed over the duration of training. If the model's prediction was right on-target then the loss is 0, and therefore, a lower loss value corresponds to a model that makes fewer mistakes.
The graph shows if and how quickly your model converged during training. In the example below, this semantic segmentation model shows very little loss towards the latter part of its training run.

Mean Average Precision (mAP)

mAP is a commonly used metric for evaluating object detection models. It is calculated by finding Average Precision (AP) for each class and then average over the number of classes. mAP is based on the following sub-metrics: IoU, Precision, and Recall.
For object detection and instance segmentation models, the mAP metric for the model's Test and Validation sets are listed next to the model type.

Intersection Over Union

Intersection over Union (IoU) measures the amount of overlap between the predicted bounding box and the ground truth bounding box. If the overlap is perfect, the IoU is 1. If it misses entirely, the IoU is 0. A higher IoU value indicates that the predicted bounding box more closely matches the ground truth bounding box.
Predicted bounding box vs. the ground truth bounding box.
Both Precision and Recall metrics use an IoU threshold to differentiate true positives from false positives. Object detection models trained using SmartML use an IoU threshold of 50%.
  • If IoU ≥0.5, the object detection is considered a True Positive (TP)
  • If Iou <0.5, then it is a wrong detection and considered a False Positive (FP);
For semantic segmentation models, the average IoU metric for the Test and Validation sets are listed next to Model Type.


Precision represents how accurate your predictions are; i.e., the percentage of your model's predictions that were actually correct. Precision can be calculated with this formula:
where TP = True Positives (predicted as positive and is correct)
FP = False Positives (predicted as positive but is incorrect)


Recall indicates how well your model can find true positives out of all the predictions.
where TP = True Positives (predicted as positive and is correct)
FN = False Negatives (predicted as negative but is incorrect)
For object detection and instance segmentation models, Precision and Recall metrics are broken down by class in the "Metrics" sub-tab:

Confusion Matrix

A confusion matrix is a summary of the prediction results from a classification model. The number of correct and incorrect predictions are summarized with counts for each broken down by class. The confusion matrix will highlight classes that performed poorly in training and may require additional/more unique data to achieve higher performance.
For classification models, the confusion matrix is found in the "Confusion Matrix" sub-tab:
Confusion matrix for a Classification model.
Use the "Result Set" selector to choose the set to analyze: Test, Training, or Validation. The Labeled Class filter lets you show results for only selected classes. You can also choose to display the results from "Most Confused" or "Most Common".
The confusion matrix only supports displaying up to 7 classes. As a result, some incorrect predictions may not be shown in the "Most Confused" (default) view.


Accuracy measures how well the model performs across all classes. It is the percentage of correct predictions out of the total number of predictions.
where TP = True Positives, TN = True Negatives, FP = False Positives, FN = False Negatives
The accuracy metric is less useful when datasets are not balanced. If samples aren't distributed evenly across classes, the accuracy will be higher for the classes with more samples.

F1 Score

A model's F1 score measures the balance between Precision and Recall by taking their harmonic mean. If Precision and Recall are high, then F1 will be high, and vice-versa.
For object detection and instance segmentation models, the F1 scores are broken down by class under the "Metrics" sub-tab.

Mean Squared Error

Mean Squared Error (MSE) is the average of the square of differences between the actual and predicted values. In short, it is the the average of a set of errors. The lower the MSE the closer the predicted values are to the regression line and the more precise the model's predictions are. Large deviations and outliers will lead to a higher MSE.
The formula for MSE is as follows:
For regression models, the MSE metrics for the model's Test and Validation sets are listed next to Model Type.

Mean Absolute Error

Mean Absolute Error (MAE) is the average of the absolute difference between the predicted values and the actual values. It measures how far the predicted values are from the actual values. MAE is calculated by taking the sum of all the absolute errors and dividing them by the number of errors.
For regression models, the MAE is listed next to the MSE metric.

Exporting Metrics

To export your model's metrics to a JSON file, click the "Export All Metrics" at the top right of the "Model Performance" tab.
A JSON file will be downloaded with the metrics data.
Example output:
"localPath": "/tmp/tmp9eydvznd/jobstate.json",
"writeUrl": "gs://plainsight-current-uploads/organizations/01G233E8JB7G8E1HXR5QM621KT/models/01G23T3S41ZTAYM08M207JZT9N/versions/01GAM0NZVVYKCEYSZTZ61J47KL/job_state.json",
"webhookUrl": "",
"running": false,
"successful": true,
"error": false,
"metrics": {
"train": {
"mean-error": -0.67,
"max-error": 116.5,
"mean-absolute-error": 5.6527,
"mean-squared-error": 69.2156,
"r2-score": 0.9583325512348129,
"explained-variance-score": 0.9586028188475673
"validation": {
"mean-error": -0.6953,
"max-error": 50.625,
"mean-absolute-error": 11.66,
"mean-squared-error": 226.484,
"r2-score": 0.861301830727059,
"explained-variance-score": 0.8615978471425298
"test": {
"mean-error": 0.7059,
"max-error": 47.25,
"mean-absolute-error": 12.3894,
"mean-squared-error": 231.2662,
"r2-score": 0.8229556668100615,
"explained-variance-score": 0.823337082603099
"datasetStats": {
"train": {
"regression": {
"bone-age-in-months": 1008
"validation": {
"regression": {
"bone-age-in-months": 126
"test": {
"regression": {
"bone-age-in-months": 126
"losses": [128.58590698242188, 112.29412841796875, 98.68473052978516, 99.66419219970703, 93.07553100585938, 74.54474639892578, 44.13039779663086, 50.251869201660156, 55.646514892578125, 27.341781616210938, 12.859375, 20.522308349609375, 19.0390625, 22.59375, 21.62890625, 19.59375, 14.47265625, 17.88671875, 16.23046875, 15.5927734375, 10.47955322265625, 24.02734375, 19.536376953125, 13.8046875, 7.815460205078125, 14.26953125, 7.79736328125, 15.982421875, 19.9453125, 12.318603515625, 9.4156494140625, 26.134765625, 10.478515625, 10.86328125, 12.28173828125, 10.34814453125, 12.5625, 7.5040283203125, 23.20703125, 7.640625, 11.1124267578125, 12.40283203125, 18.5625, 5.1968994140625, 5.38671875, 13.171875, 12.41015625, 9.43603515625, 15.0859375, 13.0625, 7.8671875, 6.38671875, 11.6436767578125, 10.06396484375, 15.85595703125, 14.501953125, 9.32080078125, 6.970703125, 7.55859375, 9.486328125, 10.146484375, 10.26123046875, 13.27734375, 9.6630859375, 7.2188720703125, 9.50390625, 11.004150390625, 9.450714111328125],
"validationLosses": [126.23751831054688, 122.13639068603516, 104.75628662109375, 91.34307861328125, 71.07752227783203, 60.4954833984375, 54.910919189453125, 64.7872543334961, 50.90571212768555, 45.65102767944336, 28.212173461914062, 22.134231567382812, 18.24211883544922, 17.055978775024414, 16.99329948425293, 18.339128494262695, 15.311100959777832, 16.350008010864258, 14.701228141784668, 14.0035982131958, 15.21068286895752, 15.39898681640625, 13.84455394744873, 17.474031448364258, 12.823548316955566, 13.224980354309082, 13.643527030944824, 14.051445960998535, 13.345271110534668, 12.7284517288208, 12.541560173034668, 13.01000690460205, 12.757511138916016, 13.465849876403809, 11.803929328918457, 13.80634593963623, 11.42141056060791, 14.909077644348145, 14.49254322052002, 12.919448852539062, 13.350934028625488, 14.412301063537598, 12.09685230255127, 16.210813522338867, 11.7015962600708, 13.174674034118652, 12.80190372467041, 10.813069343566895, 11.792767524719238, 12.25625228881836, 13.054171562194824, 11.39897632598877, 13.25074291229248, 10.839336395263672, 11.68685531616211, 12.75892162322998, 11.884628295898438, 13.786502838134766, 11.2650728225708, 11.68310546875, 11.8684663772583, 10.997429847717285, 12.028834342956543, 12.677250862121582, 12.99577808380127, 11.731112480163574, 12.3883638381958, 11.17391586303711],
"lossTimestamps": ["2022-08-16T19:09:25.920362920Z", "2022-08-16T19:09:41.930844930Z", "2022-08-16T19:10:02.510491510Z", "2022-08-16T19:10:22.442184442Z", "2022-08-16T19:10:44.484191484Z", "2022-08-16T19:11:04.735038735Z", "2022-08-16T19:11:25.646019646Z", "2022-08-16T19:11:45.270882270Z", "2022-08-16T19:11:58.744790744Z", "2022-08-16T19:12:18.683266683Z", "2022-08-16T19:12:38.753830753Z", "2022-08-16T19:12:59.060100060Z", "2022-08-16T19:13:19.769771769Z", "2022-08-16T19:13:39.341674341Z", "2022-08-16T19:14:00.366466366Z", "2022-08-16T19:14:20.292903292Z", "2022-08-16T19:14:33.825213825Z", "2022-08-16T19:14:53.408930408Z", "2022-08-16T19:15:06.890814890Z", "2022-08-16T19:15:27.659750659Z", "2022-08-16T19:15:47.392140392Z", "2022-08-16T19:16:01.398868398Z", "2022-08-16T19:16:14.989999989Z", "2022-08-16T19:16:35.165963165Z", "2022-08-16T19:16:48.536490536Z", "2022-08-16T19:17:08.332645332Z", "2022-08-16T19:17:21.726212726Z", "2022-08-16T19:17:35.244003244Z", "2022-08-16T19:17:48.682616682Z", "2022-08-16T19:18:02.013123013Z", "2022-08-16T19:18:22.031874031Z", "2022-08-16T19:18:41.919013919Z", "2022-08-16T19:18:55.325515325Z", "2022-08-16T19:19:08.748890748Z", "2022-08-16T19:19:22.121322121Z", "2022-08-16T19:19:41.627127627Z", "2022-08-16T19:19:54.964345964Z", "2022-08-16T19:20:15.148004148Z", "2022-08-16T19:20:28.665821665Z", "2022-08-16T19:20:42.194088194Z", "2022-08-16T19:20:55.725006725Z", "2022-08-16T19:21:09.281984281Z", "2022-08-16T19:21:22.692514692Z", "2022-08-16T19:21:36.152768152Z", "2022-08-16T19:21:49.692785692Z", "2022-08-16T19:22:03.250835250Z", "2022-08-16T19:22:16.731332731Z", "2022-08-16T19:22:30.243814243Z", "2022-08-16T19:22:50.143378143Z", "2022-08-16T19:23:03.792629792Z", "2022-08-16T19:23:17.354292354Z", "2022-08-16T19:23:30.906412906Z", "2022-08-16T19:23:44.324004324Z", "2022-08-16T19:23:57.697388697Z", "2022-08-16T19:24:11.075845075Z", "2022-08-16T19:24:24.510916510Z", "2022-08-16T19:24:37.975942975Z", "2022-08-16T19:24:51.383201383Z", "2022-08-16T19:25:04.839433839Z", "2022-08-16T19:25:18.195213195Z", "2022-08-16T19:25:31.702544702Z", "2022-08-16T19:25:45.279693279Z", "2022-08-16T19:25:58.858908858Z", "2022-08-16T19:26:12.419733419Z", "2022-08-16T19:26:26.014491014Z", "2022-08-16T19:26:39.522727522Z", "2022-08-16T19:26:53.074800074Z", "2022-08-16T19:27:06.575330575Z"],
"startTime": 1660676941.5795865,
"duration": 1105.4098682403564,
"accuracies": {
"train": {
"accuracy": 0.0,
"threshold": 0.5
"validation": {
"accuracy": 0.0,
"threshold": 0.5
"test": {
"accuracy": 0.0,
"threshold": 0.5