Export labeled datasets to use in ML projects
- 1.Go to the "Versions" tab
- 2.Lock your dataset version by clicking Lock under the "Actions" column. If there are any unapproved labels, you will be prompted to auto-approve all. If you wish to review the images individually, click "Cancel" and return to the Review tab to review your labels.
- 3.Click Export under the "Actions" column of the newly locked version.
- 4.The Export modal is launched. On the left, select desired format for export: Plainsight, CreateML Classifier, CreateML Object Detector, COCO, YOLO, and Pascal VOC. Note that some formats are only compatible with certain label types. Read about the formats below.
- 5.Enter your Split Dataset percentages for Train, Test, and Validate. They default to 80%, 10%, and 10% respectively. Read more about the concept of splitting datasets.
- 6.Leave the Include image files in export box checked if you want the image files included in your export.
- 7.Check the Randomize the order box to randomize the assigning of images to each split.
- 8.Check the Limit number of images box to specify a limit on how many images are exported.
- 9.Select or de-select the desired label types to export, if applicable.
- 10.Click Export Now
You will be notified by email and in-app notification when your export is ready for download. A signed URL will be provided to download your data.
Plainsight supports several popular formats for exporting labeled datasets.
- Plainsight - Our own JSON format for labeled datasets. Users can export all label types in this format.
- Create ML Classifier - Apple's machine learning model creation and training framework. Only Classification label types can be exported in this format.
- Create ML Object Detection - Apple's machine learning model creation and training framework. Only Rectangle (Bounding box) label types can be exported to this format.
- COCO - a large-scale object detection, segmentation, and captioning dataset. All label types supported by HyperLabel can be exported to this format.
- YOLO - a real-time object detection algorithm. Only Rectangles and Polygons can be exported to this format.
- Pascal VOC - “Pattern Analysis, Statistical Modeling and Computational Learning Visual Object Classes” format is the input to the Pascal object detector. Rectangle and Polygon (converted to Rectangle) are supported.
Plainsight allows you to utilize dataset splits when exporting your labels.
Split Dataset options
When training a computer vision or deep learning model, it's common practice to use 3 separate datasets: train, validation, & test.
Train datasets are composed of the data that's actually used to train a model. This is the data you want the model to see and learn from.
Validation datasets are used to see how your model is doing while training. Usually after a certain number of epochs (data cycles), the model is run on the validation dataset and returns an accuracy/loss score. Since it's never seen this data before, seeing accuracy going up and loss going down is a good indicator that your model is learning correct patterns and will generalize well to new data it's never seen. You can adjust hyperparameters based on the output of the model on this data. A hyperparameter is a parameter whose value is used to control the learning process.
Test datasets are a holdout dataset that should only be used once you have completely trained a model and want to verify that it works on data it's never seen. This is different than the validation dataset because you should not tweak hyperparmeters to try and fit it to the test data.