A Graph Clustering Algorithm Providing Scalability
Abstract
Abstract—Based on the current studies on the algorithms of the affinity propagation and normalized cut, a new scalable graph clustering method called APANC (Affinity Propagation And Normalized Cut) is proposed in this paper. During the APANC process, we firstly use the “Affinity Propagation” (AP) to preliminarily group the original data in order to reduce the data-scale, and then we further group the result of AP using “Normalized Cut” (NC) to get the final result. Through such combination, the advantages of AP in time costs and the advantages of NC in accuracy have been adopted. The experimental results show that even though APANC includes two clustering processes, this two-phase algorithm helps to reduce the experiment time compared to NC, and meanwhile, maintain the accuracy. Furthermore, the advantages of APANC in time costs could be greater when data scale increases.
Keywords
References
Jain A.K., Murty M.N., Flynn P.J.: Data Clustering: A Review. ACM Computing Surveys, 31 (1999) 264–323.
http://dx.doi.org/10.1145/331499.331504
Xu R., Wunsch II D.: Survey of Clustering Algorithms. IEEE Trans. on Neural Networks, 16(3) (2005) 645–678.
http://dx.doi.org/10.1109/TNN.2005.845141
PMid:15940994
Jiabing Wang, Hong Peng, Jingsong Hu, Chuangxin Yang, C.: A graph clustering algorithm based on minimum and normalized cut. In: Proceedings of the Seventh International Conference on Computational Science, pp. 497–504 (2007)
Sun JG, Liu J, Zhao LY, “Clustering algorithms research,” Journal of Software, 2008,19(1):48−61.
http://dx.doi.org/10.3724/SP.J.1001.2008.00048
Brendan J. Frey, Delbert Dueck, “Clustering by passing messages between data points,” Science 315,972(2007).
http://dx.doi.org/10.1126/science.1136800
PMid:17218491
M. J. Brusco, H.-F. Köhn, Science 319, 726 (2008); www.sciencemag.org/cgi/content/full/319/5864/726c.
http://dx.doi.org/10.1126/science.1150938
PMid:18258881
Brendan J. Frey, Delbert Dueck, Response to comment on "Clustering by passing messages between data points", Science 319, 726d (2008).
Shi J., Malik J.: Normalized Cuts and Image Segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(8) (2000) 888–905.
http://dx.doi.org/10.1109/34.868688
Carballido-Gamio J., Belongie S., Majumdar J. S.: Normalized Cuts in 3-D for Spinal MRI Segmentation. IEEE Trans. on Medical Imaging, 23(1) (2004) 36–44.
http://dx.doi.org/10.1109/TMI.2003.819929
PMid:14719685
Duarte A., Sánchez Á., Fernández F., et al: Improving Image Segmentation Quality through Effective Region Merging Using a Hierarchical Social Metaheuristic. Pattern Recognition Letters, 27 (2006) 1239–1251.
http://dx.doi.org/10.1016/j.patrec.2005.07.022
He X., Zha H., Ding C. H.Q., et al: Web Document Clustering Using Hyperlink Structures. Computational Statistics & Data Analysis, 41 (2002) 19–45.
http://dx.doi.org/10.1016/S0167-9473(02)00070-1
Li H., Chen W., Shen I-F.: Segmentation of Discrete Vector Fields. IEEE Trans. on Visualization and Computer Graphics, 12(3) (2006) 289–300.
http://dx.doi.org/10.1109/TVCG.2006.54
PMid:16640243
Ngo C., Ma Y., Zhang H.: Video Summarization and Scene Detection by Graph Modeling. IEEE Trans. on Circuits and Systems for Video Technology, 15(2) (2005) 296–305.
http://dx.doi.org/10.1109/TCSVT.2004.841694
Yu Stella X., Shi J.: Segmentation Given Partial Grouping Constraints. IEEE Trans. on Pattern Analysis and Machine Intelligence, 26(2) (2004) 173–183.
http://dx.doi.org/10.1109/TPAMI.2004.1262179
PMid:15376893
Xiaobin Li, Zheng Tian “Optimum cut-based clustering” ScienceDirect Signal Processing 87 (2007) 2491-2502
http://dx.doi.org/10.1016/j.sigpro.2007.03.017
Full Text: PDF


