Comment on page


Tune your model's training settings with advanced options


A hyperparameter is a configuration that is external to the model and can be tweaked and tuned to optimize a model's performance. Hyperparameters directly impact the training behavior so it is extremely important to tune them correctly. Some of the hyperparameters that can be configured are:
  • dataset splits
  • backbone
  • learning rate
  • batch size
  • resolution
  • tiling
  • augmentation

SmartML Hyperparameters

Hyperparameters can be configured when creating a new model version. Open the optional SmartML Hyperparameters section at the bottom of Add New Model Or Add New Model Version to adjust these advanced settings for your training run.

Split Dataset

Split your data into Training, Validation, and Test datasets for model training. The defaults are: 80% Training, 10% Validation, and 10% Test. You can also select whether or not to randomize the order of the data between the splits.
Read about why splitting datasets is valuable when training computer vision models.


Backbones are reusable models which are leveraged across multiple computer vision tasks such as classification, object detection, etc. Backbones are pre-trained on a large, open-source dataset, and they contain a lot of existing knowledge/information.

ResNet Backbones

Residual Network, or ResNet for short, is a classic neural network that was introduced in 2015 and is used as a backbone for many computer vision tasks. SmartML supports the following ResNet backbones:
  • R50 FPN 3X (ResNet-50)
  • R101 FPN 3X (ResNet-101)
  • X101 FPN 3X (ResNeXt-101 32X8d)
Larger models are usually more accurate, but lead to slower inference times. Relative to R50_FPN_3X, inference times are:
  • R101_FPN_3X - 1.3x
  • X101_FPN_3X - 2.27x
Accuracy is a little harder to predict, since that depends on the size/quality/variance/etc of the dataset. On standard benchmarks (MS-COCO), accuracy relative to R50_FPN_3X is roughly:
  • R101_FPN_3X - 1.1x
  • X101_FPN_3X - 1.2x

EfficientNet Backbones

EfficientNet backbones were introduced in 2019 and are much accurate than ResNet (assuming similar inference times). They provide very efficient inference on CPU as well as fast model training. EfficientNet variants support all model types. SmartML supports the following EfficientNets:
  • efficientnet_b0 through efficientnet_b7 - available for all models. Default for cloud models is efficientnet_b3.
  • efficientdet_lite0 through efficientdet_lite4 - available for mobile models only. Default for mobile models is efficientdet_lite2.

Vision Transformer Backbones

The Vision Transformer (ViT) is a near state-of-the-art model for image classification that employs a Transformer-like architecture over patches of the image. ViT is only supported for classification, regression, and semantic segmentation models types. SmartML supports the following ViT variants:
  • "vit_tiny_patch16_224_in21k" (vit tiny)
  • "vit_small_patch16_224_in21k" (vit small)
  • "vit_base_patch16_224_in21k" (vit base)

Learning Rate

The learning rate determines the step size at each iteration while moving toward a minimum of a loss function. It is set to a default of 0.0001 and can be configured to a maximum of 0.05.

Batch Size

The batch size is number of samples processed before the model is updated. The recommended batch size is 4-16 with a default of 16. The maximum value should not exceed the total number of images in the training set.


Resize Strategy is set to Keep Aspect Ratio with a max width size that defaults to 800 (configurable up to 1500).


Tiling subdivides an image into smaller “tiles”, inferences on those tiles, and aggregates the results for the original image. Reach more about how to configure tiling options:

Data Augmentation

Read more about Data Augmentation with SmartML and the options configurable for advanced users: