Optimizing Performance with Parallel K-Means in Tunnel Monitoring Data Clustering Algorithm for Cloud Computing

Authors

  • Vijaykumar Mamidala Author

Keywords:

K-means clustering, Parallel computing, MapReduce, Scalability, Fault tolerance, Dynamic load balancing, Real-time processing, High-dimensional data

Abstract

The parallel K-means clustering approach, intended to maximize cloud computing performance in 
tunnel monitoring data analysis, is introduced in the abstract. Large-scale datasets cannot benefit 
from the high processing complexity of traditional sequential K-means. Parallel K-means, which 
makes use of distributed computing frameworks like MapReduce, lessens these difficulties by 
dividing up processing jobs among several nodes. This technique creates centroid representations 
for each cluster in the dataset and updates them iteratively until convergence. Scalability, 
performance optimization, effective data management, and fault tolerance are important goals that 
are essential for cloud-based data processing pipelines. Research gaps still exist in dynamic load 
balancing, parameter selection, real-time processing, energy efficiency, and managing high- 
dimensional data, despite progress in these areas. The primary issue discussed is the inefficiency 
of sequential K-means on big datasets, which is made worse by the modern data's growing amount, 
diversity, and speed. The parallel K-means technique addresses the drawbacks of the sequential 
approach and effectively clusters large datasets by leveraging MapReduce. Data preprocessing, 
MapReduce-based algorithm execution, system architecture, and metrics for performance 
assessment are all part of the methodology. The experimental design modifies variables like as the 
number of clusters, size of the dataset, and number of iterations in order to evaluate execution time, 
speed, scalability, and cluster quality. As a result of the notable performance gains shown by the 
results, parallel K-means is crucial for contemporary data analytics, especially in cloud settings. 
The goal of ongoing research is to improve real-time processing, parameter selection, and load 
balancing in order to increase the algorithm's efficiency and suitability for use in big data 
applications. 

 

Downloads

Published

27-12-2021

How to Cite

Optimizing Performance with Parallel K-Means in Tunnel Monitoring Data Clustering Algorithm for Cloud Computing. (2021). International Journal of Engineering Research and Science & Technology, 17(4), 34-49. https://ijerst.org/index.php/ijerst/article/view/99