Protege Pipelines
The Protege Pipelines package provides orchestration logic and job specification tooling for training machine learning models across multiple tasks, datasets, and environments.
It offers a CLI for launching training workflows using TOML-based job specifications, integrating with cloud backends and pipeline engines.
✨ Features
-
Job Specification-Driven
- Declarative training config via TOML.
- Supports datasets, augmentations, schedulers, and export settings.
-
Task-Agnostic Pipelines
- Classification, detection, segmentation, and keypoint workflows.
- Modular support for Encord datasets and Google Cloud runners.
-
CLI Entry Point
- One command to launch a full training pipeline.
- Includes vertex support and caching logic.
-
Cloud Integration
- GCP-first orchestration with disk, GPU, and machine type config.
- Artifact handling and bucket export support built-in.
🛠️ Use Cases
- Launching training jobs from a structured spec.
- Building CI/CD flows for retraining models.
- Integrating with job schedulers like Vertex AI Pipelines.