GENAI FOR IMAGE CAPTIONING AND DESCRIPTION

Authors

  • 1 A Satyanarayana, 2 P Nithin, 3 J Nagasai, 4 P Chandhu, 5 L Ajay Author

DOI:

https://doi.org/10.62643/

Abstract

The GENAI for Image Captioning and Description project is an advanced application of Generative Artificial Intelligence that combines computer vision and Natural Language Processing technologies to automatically generate rich, accurate, and context-aware textual descriptions for images. The system is designed to analyze visual content and produce meaningful captions that describe objects, scenes, actions, attributes, and relationships within an image in a human-like manner. This technology plays an important role in applications such as accessibility support, smart surveillance, image indexing, digital media management, content recommendation, and automated visual understanding systems. The project implements a complete Vision-Language Model (VLM) pipeline using BLIP-2 (Bootstrapped LanguageImage Pre-training 2) integrated with the OPT-2.7B language model decoder. The system is trained and evaluated using the COCO Captions 2017 dataset, which contains more than 123,000 images with approximately five reference captions for each image. The dataset includes diverse scenes such as indoor environments, outdoor activities, sports, food items, wildlife, and human interactions, enabling the model to learn complex visual-semantic relationships effectively. The implemented model achieved strong performance across multiple evaluation metrics. Experimental results include a CIDEr score of 145.8, exceeding the target benchmark of 130, and a BLEU-4 score of 38.6 with high n-gram overlap precision on 5,000 test images. The system demonstrates high contextual accuracy by generating detailed scene descriptions that include object identification, attribute recognition, spatial relationships, and activity understanding. The model also provides efficient inference performance with an average processing time of approximately 1.4 seconds per image using NVIDIA A100 GPU hardware.

Downloads

Published

12-06-2026

How to Cite

GENAI FOR IMAGE CAPTIONING AND DESCRIPTION. (2026). International Journal of Engineering Research and Science & Technology, 22(2(1), 2470-2479. https://doi.org/10.62643/