CLARITYGATE: A FASTAPI-BASED DUPLICATE DETECTION AND TAGGING SYSTEM

Authors

  • Dr. M.V. NARAYANA, HARSHITHA THIRUMAL , S.VAISHNAVI Author

DOI:

https://doi.org/10.62643/

Keywords:

Duplicate Detection, Data Deduplication, Text Similarity

Abstract

This project presents a Python-based AI service for detecting duplicate or near duplicate text and providing basic tagging across web platforms. Built with FastAPI, scikit learn, and NumPy, it uses TF-IDF vectorization and cosine similarity to compare new content against an existing corpus in real time. The system exposes simple REST APIs, includes an embeddable JavaScript widget for background checks while users type, and supports configurable thresholds. A proof-of-concept dataset (22 posts) demonstrates functionality; tagging is currently a placeholder endpoint designed for future NLP upgrades. The solution is portable, privacy-friendly (self-hostable), and easy to integrate into forums, support portals, and e-commerce Q&A

Downloads

Published

10-02-2026

How to Cite

CLARITYGATE: A FASTAPI-BASED DUPLICATE DETECTION AND TAGGING SYSTEM. (2026). International Journal of Engineering Research and Science & Technology, 22(1), 250-254. https://doi.org/10.62643/