Skip to main content

Protege Pipelines

The Protege Pipelines package provides orchestration logic and job specification tooling for training machine learning models across multiple tasks, datasets, and environments.

It offers a CLI for launching training workflows using TOML-based job specifications, integrating with cloud backends and pipeline engines.

✨ Features

  • Job Specification-Driven

    • Declarative training config via TOML.
    • Supports datasets, augmentations, schedulers, and export settings.
  • Task-Agnostic Pipelines

    • Classification, detection, segmentation, and keypoint workflows.
    • Modular support for Encord datasets and Google Cloud runners.
  • CLI Entry Point

    • One command to launch a full training pipeline.
    • Includes vertex support and caching logic.
  • Cloud Integration

    • GCP-first orchestration with disk, GPU, and machine type config.
    • Artifact handling and bucket export support built-in.

🛠️ Use Cases

  • Launching training jobs from a structured spec.
  • Building CI/CD flows for retraining models.
  • Integrating with job schedulers like Vertex AI Pipelines.