ENSEMBLE-EMPOWERED TF-IDF: SCALABLE NLP FOR  AUTOMATED ECLIPSE BUG REPORT CLASSIFICATION

T. Pravalika; Lakshmi Meenakshi Nirukurthy; Chinmayee Madduri; Bhavana Pulla

doi:10.62643/ijerst.v21.n3(1).pp646-651

Authors

T. Pravalika Author
Lakshmi Meenakshi Nirukurthy Author
Chinmayee Madduri Author
Bhavana Pulla Author

DOI:

https://doi.org/10.62643/ijerst.v21.n3(1).pp646-651

Keywords:

Bug Report Classification, TF-IDF, Natural Language Processing (NLP), Text Classification, Eclipse

Abstract

Software maintenance in large open-source ecosystems depends heavily on fast and accurate bug
triage. The Eclipse project alone receives tens of thousands of issue reports annually, covering a wide
range of components and severity levels. Traditionally, human experts perform this triage manually, a
process that is time-consuming, inconsistent, and increasingly unmanageable as the scale of the
project grows. Previous attempts to automate this task using single machine learning models like
Support Vector Machines (SVM) or Logistic Regression have achieved moderate accuracy
(approximately 70–75%), but these approaches often rely on extensive feature engineering and
struggle to generalize across evolving bug datasets. This research proposes a scalable, ensemble-based
framework for automating Eclipse bug classification with high accuracy. The system processes raw
bug descriptions through a comprehensive preprocessing pipeline—tokenization, stop word removal,
and lemmatization—and converts the text into numerical representations using Term Frequency
Inverse Document Frequency (TF-IDF). It then trains and evaluates five classifiers on a curated
Eclipse–Mozilla dataset: SVM, Random Forest Classifier (RFC), Logistic Regression Classifier
(LRC), Extra Trees Voting (EV) ensemble, and Extreme Gradient Boosting (XGBoost). A user
friendly GUI is integrated into the system, enabling non-experts to upload data, visualize
preprocessing steps, and select models. With a 70/30 train-test split, the models yield the following
performance: SVM achieves 74.23% accuracy, 83.03% precision, 73.94% recall, and 75.24% F₁
score; RFC scores 83.51% accuracy, 87.68% precision, 83.40% recall, and 84.09% F₁; LRC records
70.10% accuracy, 75.06% precision, 70.19% recall, and 71.10% F₁; EV ensemble achieves 89.69%
accuracy, 91.10% precision, 90.29% recall, and 90.21% F₁; while XGBoost outperforms all others
with 92.27% accuracy, 92.91% precision, 92.65% recall, and 92.50% F₁-score. These results
underscore the strength of ensemble methods, particularly XGBoost, in delivering reliable and
scalable bug classification for large open-source projects.