Using Tiling in SmartML model training for small object detection
Computer vision models for object detection are generally well-trained for detecting relatively large objects. When small objects within high resolution images need to be detected, SmartML can make use of tiling. Tiling subdivides an image into smaller “tiles”, inferences on those tiles individually, and aggregates the results for the full image.
Tiling is only effective for object-based detection where labels are localized rather than global to the entire image. Therefore, only bounding box and polygon models support the tiling feature. Tiling works best on large, high resolution images with small and/or sparsely scattered objects.
Tiling can be used to detect small objects in high resolution images by breaking up an image into smaller image crops.
Note about Training Time: Training using tiling can add up to 50% pre-processing time.
Tiled models need additional time for processing. Set your training length accordingly.
Tiling splits up images into individual image crops, resulting in an exponentially larger dataset to training. This will significantly increase processing time when training tiled models. Our general recommendation when selecting an appropriate training length is starting with 1 hour per 1000 images. To ensure you have enough time budgeted to train, you may want to consider selecting a training length that fits the projected size of your tiled dataset.
To do this, we first need to calculate tile count. Let's take a 100-image dataset with a target image size of: 4096px x 2160px
Our tiling settings for this dataset could be:
- 1.tile width: 400px
- 2.tile width overlap %: 20% (80px)
- 3.tile height: 400px
- 4.tile height overlap: 20% (80px)
To calculate tile count:
- 1.Get the width tile count (round up): target image width / (max tile width - overlap)In our example: 4096 / (400 - 80) = 12.8 (rounded to 13)
- 2.Get the height tile count (round up) = target image height / (max tile height - overlap)In our example: 2160 / (400 - 80) = 6.75 (rounded to 7)
- 3.Total tile count = width tile count x height tile countIn our example: 13 x 7 = 91
To calculate the projected size of your tiled dataset (assuming all dataset images have the same target dimensions), multiply your tile count by the number of images in your dataset. In our example, 100 images x 91 tiles each = 9100
To estimate the required training time using our general rule of thumb, divide this value by 1000 images. In our example, 9100 / 1000 = 9.1. We should adjust our training length to 10 hrs to ensure we are budgeting enough time for training.
Tiling can be configured when adding a model or model version.
- 1.In the "Add New Model" or "Add New Model Version" screen, make your dataset options and model output selections.
- 3.Open the SmartML Hyperparameters toggle to view the advanced settings.
Click to configure advanced options
2. Scroll down and click the Use custom tiling toggle to enable the tiling feature.
3. Set your custom tiling options:
Tile Size - Height and width of each tile. Accepted range is 200px - 800px with a default of 800px.
The tile height and width entered cannot be larger than the height and width of any image in the dataset. A tile must be small enough to fit within the dimensions of every image.
Custom Tile Overlap (Stride) - Specify the tile overlap or stride. This value is a percentage of the tile dimension and defaults to 50%.
What is tile overlap (Stride)?
Stride is the amount of overlap between tiles. Tiling generates image crops using a sliding window which adds overlaps between tiles. This ensures that no object is missed at the borders. See the animation below to understand how the stride size works in relation to the sliding window.
Sliding window for each tile is based on stride, or tile overlap.
5. Once you are satisfied with your model training settings, scroll to the bottom and click "Save and Start Training".