ShutterSage-AI: Intelligent Photo Metadata Tagging
An AI-powered CLI tool for zero-shot image metadata tagging with NPU acceleration, supporting RAW and standard formats.
Photography generates thousands of images, but organizing them with descriptive tags is often a tedious, manual process. As an avid photographer myself, I found my image library becoming increasingly scattered and difficult to navigate. ShutterSage-AI was born from this personal frustration,a tool built to bring vision models directly to my local workflow, enabling automated and intelligent tagging.
Project Overview
ShutterSage-AI is a command-line interface designed for professional photographers and enthusiasts who need to handle large volumes of high-resolution imagery. I built this specifically to solve the "scattered library" problem by automating the metadata tagging process at the source.
- Zero-Shot Analysis: Utilizes OpenAI's CLIP (Contrastive Language-Image Pretraining) to understand image content.
- Hardware Optimized: Engineered to use the full capability of my Snapdragon Copilot++ PC, leveraging NPU acceleration via DirectML to make image analysis exceptionally fast on modern ARM-based laptops.
- Lossless Workflow: Instead of modifying the pictures, ShutterSage writes tags into dc:subject XMP sidecars, ensuring the integrity of original RAW files.
- Batch Processing: Designed for efficiency, it can process entire directories of mixed media types in minutes.
Technologies
- Core Engine: Python
- Vision Model: CLIP Large
- Deep Learning: PyTorch
- Acceleration: DirectML (Windows AI)
Performance Showcase
One of the primary goals was to make high-fidelity AI tagging accessible on mobile workstations. By optimizing the inference pipeline for the NPU, ShutterSage-AI achieves better tagging speeds with minimal battery impact.
To achieve this, I focused on three technical pillars:
- Hardware Acceleration via DirectML: Instead of relying on traditional CPU processing, I implemented a DirectML backend. This allows ShutterSage-AI to tap directly into the dedicated NPU cores of the Snapdragon architecture, offloading the heavy CLIP model inference from the power-hungry CPU/GPU.
- Optimized Inference Pipeline: I tuned the data ingestion pipeline to handle asynchronous pre-processing. By utilizing multi-threading for image resizing and normalization, the NPU remains fully utilized, resulting in a 4x speed improvement compared to standard CPU-only tagging.
- Local Memory Efficiency: By managing tensors efficiently within the application’s memory space, I reduced data shuffling between the system RAM and the NPU, which is critical for maintaining high performance on slim mobile workstations without triggering thermal throttling.
The CLI provides real-time feedback with confidence scores, allowing users to tune the sensitivity of the tagging engine to match their specific metadata standards.
Usage Example
The tool is designed to be developer and photographer friendly. A simple PowerShell script automates the environment setup:
.\start.ps1
Or run it directly via the CLI:
python main.py "C:\Path\To\MyPhotos" --threshold 0.15 --recursive
This command will recursively scan the directory, analyze each image, and generate XMP files containing tags like "Mountain", "Sunset", or "Portrait" based on the AI's visual understanding.