Image Captioning App

Overview

This application generates descriptive captions for images using advanced ML models. It processes single images or entire directories, leveraging CLIP and LLM models for accurate and contextual captions. It has NSFW captioning support with natural language.

Features

Single image and batch processing
Multiple directory support
Custom output directory
Adjustable batch size
Progress tracking

Usage

Command	Description
`python app.py image.jpg`	Process a single image
`python app.py /path/to/directory`	Process all images in a directory
`python app.py /path/to/dir1 /path/to/dir2`	Process multiple directories
`python app.py /path/to/dir --output /path/to/output`	Specify output directory
`python app.py /path/to/dir --bs 8`	Set batch size (default: 4)

Technical Details

Models: CLIP (vision), LLM (language), custom ImageAdapter
Optimization: CUDA-enabled GPU support
Error Handling: Skips problematic images in batch processing

Requirements

Python 3.x
PyTorch
Transformers library
CUDA-capable GPU (recommended)

Installation

git clone https://huggingface.co/Wi-zz/joy-caption-pre-alpha
cd joy-caption-pre-alpha
pip install -r requirements.txt

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License.