The Data System Gap: Identifying Inconsistencies in Information Pipelines

Authors

  • G. Sreelekha, G. Vijaya, B. Tejaswini, B. Uday Kumar Author

DOI:

https://doi.org/10.62643/

Keywords:

Data Quality, Data Profiling, Pipeline Debugging, Data Repair, Machine Learning, Django

Abstract

Data-driven system failures often originate from inconsistencies within the data rather than software logic errors. This paper implements a DataPrism-inspired system for detecting and repairing dataset issues in ML pipelines. The system compares reference (passing) and problematic (failing) datasets to identify discriminative data profiles including missing values, domain drift, population bias, and schema inconsistencies. Detected issues are corrected through automated data imputation and normalization. A Django web interface enables dataset analysis, issue visualization, repair, and download. ML model evaluation (Random Forest, Gradient Boosting, SVM, AdaBoost, Logistic Regression) on original versus repaired datasets demonstrates that data repair improves average model accuracy by 28.5%. The system provides an automated approach for detecting data-system mismatches and improving data-driven system reliability.

Downloads

Published

28-03-2026

How to Cite

The Data System Gap: Identifying Inconsistencies in Information Pipelines. (2026). International Journal of Engineering Research and Science & Technology, 22(1(2), 139-143. https://doi.org/10.62643/