REASONING-DRIVEN HUMAN–OBJECT INTERACTION DETECTION USING YOLO-BASED LOCALIZATION AND VISION–LANGUAGE INTELLIGENCE

Authors

  • A. Kushyanth Author
  • D. Pranathi Sri Author
  • C. Kameswari Shreya Author
  • B. Vardhan Author
  • Ms V.P.V Bharathi Author

DOI:

https://doi.org/10.62643/ijerst.2026.v22.n1(2).pp233-237

Keywords:

Human–Object Interaction; YOLOv8; Spatial Reasoning; Vision–Language Model; BLIP-2; Object Detection; Computer Vision

Abstract

Human–Object Interaction (HOI) detection is a fundamental task in computer vision that aims to identify meaningful relationships between people and surrounding objects. This paper presents a reasoning-driven HOI detection system that integrates YOLOv8n-based object localization with a multi-tier spatial reasoning engine and a Vision–Language Model (VLM) backend. The proposed pipeline detects persons and objects, extracts candidate pairs through Euclidean distance and Intersection-over-Union (IoU) metrics, and applies rule-based and semantically grounded reasoning to classify interactions. On a diverse evaluation set spanning sports, office, and daily-living scenes, the system achieves 87.3% precision and 82.6% recall at a 120-pixel distance threshold, outperforming baseline IoU-only methods by 14.2 percentage points in F1- score. The modular design supports plug-in VLM reasoning (BLIP-2) and is deployed as a Flask web application, enabling real-time visual explanations.

Downloads

Published

28-03-2026

How to Cite

REASONING-DRIVEN HUMAN–OBJECT INTERACTION DETECTION USING YOLO-BASED LOCALIZATION AND VISION–LANGUAGE INTELLIGENCE. (2026). International Journal of Engineering Research and Science & Technology, 22(1(2), 233-237. https://doi.org/10.62643/ijerst.2026.v22.n1(2).pp233-237