ChatTag

The ChatTag Filter uses OpenAI's ChatGPT Vision API to automatically analyze and annotate images across diverse domains. It can detect objects, classify content, and generate bounding boxes with confidence scores, integrating seamlessly with OpenFilter pipelines.

This document is automatically published to production documentation on every production release.

✨ Key FeaturesDirect link to ✨ Key Features

AI-powered image annotation using OpenAI's ChatGPT Vision API
Multi-domain support for any image classification task (food, pets, medical, industrial, etc.)
Configurable prompts for different annotation requirements
Bounding box detection with precise normalized coordinates
Confidence scoring for each annotation with quality validation
Cost optimization through smart image resizing and quality settings
Multiple output formats including binary classification, COCO detection, and JSONL datasets
Environment-variable-based configuration for runtime overrides
Web visualization interface for real-time monitoring and debugging

🚀 Usage ModesDirect link to 🚀 Usage Modes

1. Standard Annotation ModeDirect link to 1. Standard Annotation Mode

Given a frame, the filter performs AI analysis and returns structured annotations. The output is stored in:

frame.data['meta']['chatgpt_annotator']

Each annotation includes:

{
  "annotations": {
    "avocado": {
      "present": true,
      "confidence": 0.95,
      "bbox": [0.2, 0.3, 0.4, 0.5]
    }
  },
  "usage": {
    "input_tokens": 26088,
    "output_tokens": 107,
    "total_tokens": 26195
  },
  "processing_time": 2.34
}

2. Dataset Generation ModeDirect link to 2. Dataset Generation Mode

If enabled via save_frames, the filter generates multiple dataset formats:

Binary Classification: binary_datasets/item_name_labels.json
COCO Detection: detection_datasets/annotations.json
JSONL Results: labels.jsonl

3. No-ops Testing ModeDirect link to 3. No-ops Testing Mode

If no_ops=true, the filter skips API calls and uses default annotations for testing.

⚙️ Configuration OptionsDirect link to ⚙️ Configuration Options

Field	Type	Description
`chatgpt_api_key`	`str`	OpenAI API key (required)
`prompt`	`str`	Path to prompt file (required)
`output_schema`	`dict`	Expected output format schema
`chatgpt_model`	`str`	Model to use (default: `gpt-4o-mini`)
`confidence_threshold`	`float`	Minimum confidence for positive classification (default: `0.9`)
`max_image_size`	`int`	Max image size for cost optimization (default: `0` = keep original)
`save_frames`	`bool`	Whether to save results and generate datasets
`output_dir`	`str`	Output directory for saved results
`no_ops`	`bool`	Skip API calls for testing (default: `false`)

All fields are configurable via code or environment variables prefixed with FILTER_. Example: FILTER_CHATGPT_API_KEY=sk-your-key-here

🧪 Example ConfigurationsDirect link to 🧪 Example Configurations

Basic annotationDirect link to Basic annotation

FilterChatgptAnnotatorConfig(
  chatgpt_api_key="sk-your-key-here",
  prompt="./prompts/food_prompt.txt",
  output_schema={
    "avocado": {"present": False, "confidence": 0.0, "bbox": None},
    "lettuce": {"present": False, "confidence": 0.0, "bbox": None}
  }
)

With dataset generationDirect link to With dataset generation

FilterChatgptAnnotatorConfig(
  chatgpt_api_key="sk-your-key-here",
  prompt="./prompts/food_prompt.txt",
  save_frames=True,
  output_dir="./output_frames",
  confidence_threshold=0.8,
  max_image_size=512
)

🧠 Output BehaviorDirect link to 🧠 Output Behavior

frame.data['meta']['chatgpt_annotator'] includes annotation results with confidence scores
If save_frames = True, datasets are generated in multiple formats (binary, COCO, JSONL)
If no_ops = True, default annotations are used without API calls
Automatic quality validation and error handling for malformed responses

🧩 IntegrationDirect link to 🧩 Integration

Use the filter directly:

from filter_chatgpt_annotator.filter import FilterChatgptAnnotator
FilterChatgptAnnotator.run()

Or as part of a multi-stage pipeline:

Filter.run_multi([
    (VideoIn, {...}),
    (FilterChatgptAnnotator, {...}),
    (Webvis, {...})
])

🧼 NotesDirect link to 🧼 Notes

Supports both classification and object detection tasks based on output schema
Automatically generates balanced datasets for training
Cost optimization through configurable image resizing and quality settings
Built-in web interface available at http://localhost:8000 for monitoring
Comprehensive error handling and quality validation

For detailed guidance, revisit the Configuration Options and Example Configurations sections above—they cover the same workflows step by step.