Skip to main content

Optical Character Recognition (OCR) Filter

The FilterOpticalCharacterRecognition is a pluggable filter that extracts text from image frames using Optical Character Recognition (OCR). It supports multiple OCR backends and offers flexible configuration for language support, output, and debug logging.

Features

  • Dual OCR Engine Support
    Choose between:

  • Multi-language OCR
    Use the ocr_language option to specify one or more language codes (e.g., en,fr).

  • Topic-based Processing

    • Filter frames by topic using topic_pattern regex
    • Exclude specific topics using exclude_topics list
    • Support for exact topic names or regex patterns in exclusions
  • Flexible Output Options

    • Write results to JSON file (configurable via write_output_file)
    • Forward OCR results in frame metadata (configurable via forward_ocr_texts)
    • Results are written to output_json_path as newline-delimited JSON
  • Debug Mode
    Enabling debug: true will increase logging verbosity for troubleshooting and transparency.

  • Frame-level Skipping
    Add the metadata flag skip_ocr: true to individual frames to bypass OCR processing.

  • Custom Tesseract Path
    You can specify a custom tesseract_cmd binary path if using the Tesseract engine (defaults to a bundled AppImage).

  • Safe Streaming Output
    Results are flushed to disk immediately after processing each frame.

    Note

    This may lead to heavy I/O operations. A configurable flushing strategy is planned for future releases.

Example Output

Each processed frame will produce a JSON line similar to:

{
"topic": "camera",
"frame_id": "abc123",
"texts": ["Detected text line 1", "Detected text line 2"]
}

When forwarding results in metadata, they are stored under the ocr_texts key in the frame metadata, with topics as keys:

{
"meta": {
"ocr_texts": {
"camera": ["Detected text line 1", "Detected text line 2"],
"thermal": ["Temperature: 25°C"]
}
}
}

When to Use

This filter is ideal for any pipeline that requires reading printed or handwritten text from images, such as:

  • Scanned documents
  • Signboards or product packaging in photos
  • Scene text in videos
  • Multi-camera systems with different text sources

Configuration Reference

KeyTypeDefaultDescription
ocr_enginestring"easyocr"OCR engine to use: "tesseract" or "easyocr"
ocr_languagestring[]["en"]List of language codes for OCR
output_json_pathstring"./output/ocr_results.json"Path to save output results
debugbooleanfalseEnable debug logging
tesseract_cmdstringPackaged AppImage pathPath to Tesseract binary
forward_ocr_textsbooleantrueWhether to forward OCR results in frame metadata
write_output_filebooleanfalseWhether to write results to output file
topic_patternstringnullRegex pattern to match topic names
exclude_topicsstring[][]List of topics to exclude from OCR processing

Environment Variables

All configuration options can be overridden using environment variables with the prefix FILTER_. For example:

  • FILTER_OCR_ENGINE
  • FILTER_OCR_LANGUAGE
  • FILTER_DEBUG
  • FILTER_TOPIC_PATTERN
  • FILTER_EXCLUDE_TOPICS

Boolean values should be set to "true" or "false" (case-insensitive). List values should be comma-separated strings.