Intelligent Image Analysis and Visual Question Answering System Using Deep Learning and Transformer Models

Authors

  • CHUTTUGULLA VAMSI Author
  • B.Suryanarayana Murthy Author

DOI:

https://doi.org/10.62643/

Abstract

The rapid advancement of artificial intelligence has significantly transformed the field of image
analysis, enabling systems to not only recognize visual content but also interpret and describe it in
natural language. This paper presents an intelligent image analysis and Visual Question
Answering (VQA) system that integrates deep learning and transformer-based models to provide
automated image understanding, caption generation, and user-interactive query responses. The
system is designed to handle both medical and general image datasets, making it versatile for
multiple real-world applications such as healthcare diagnostics, wildlife recognition, and
intelligent assistants.
The proposed system utilizes a convolutional neural network, specifically ResNet50, for image
classification tasks. Pretrained models are fine-tuned to categorize images into predefined classes
such as medical conditions or animal types. To enhance interpretability, the system incorporates
the BLIP (Bootstrapping Language-Image Pretraining) model for generating human-like captions
that describe the content of the image. This enables users to gain contextual understanding of
visual data without manual inspection.
Furthermore, the system integrates a transformer-based language model (FLAN-T5) to support
Visual Question Answering. Users can input queries related to the selected image, and the system
generates meaningful answers by leveraging natural language processing capabilities. This
combination of vision and language models creates a knowledge-based interactive system that
bridges the gap between image recognition and semantic understanding.

Downloads

Published

03-04-2026

How to Cite

Intelligent Image Analysis and Visual Question Answering System Using Deep Learning and Transformer Models. (2026). International Journal of Engineering Research and Science & Technology, 22(2), 366-375. https://doi.org/10.62643/