Journal of Software, Vol 4, No 10 (2009), 1119-1126, Dec 2009
doi:10.4304/jsw.4.10.1119-1126

An Efficient Parallel Clustering Algorithm for Large Scale Database

Jianfeng Yang, Puliu Yan, Yinbo Xie, Qing Geng, Jolly Wang, Nick Bao

Abstract


In this paper, we propose a new parallel clustering algorithm, named Stem-Leaf-Point Plot Clustering Algorithm (SLPPCA). SLPPCA tends to produce clusters of different shapes and sizes, and according to our experiments, it can produces clusters more efficiently than traditional methods. SLPPCA can fully exploits the data-parallelism of data objects, and adopts a task decomposition design step to balance the workloads of multi-core processors to achieve a high speedup. We implemented SLPPCA to large scale data base on duo-core processor and quad-core processor based computer separately and analyzed its performance. The experimental results show that the clusters it produced were particularly good either in different density or shapes, furthermore, with the parallel pattern used in SLPPCA on multi-core platform, the speedup was almost linear with the numbers of cores in processor and the number of data points. Moreover, SLPPCA can generate satisfactory cluster number automatically in clustering process.



Keywords


Clustering, SLPPCA, SLPP, Parallel Processing, Performance Analysis, Parallel Pattern

References



Full Text: PDF


Journal of Software (JSW, ISSN 1796-217X)

Copyright @ 2006-2011 by ACADEMY PUBLISHER – All rights reserved.