An Interactive Tool for Human Active Learning in Constrained Clustering
Abstract
This paper describes an interactive tool for constrained clustering that helps users to efficiently select effective constraints during the constrained clustering process. Constrained clustering is a promising technique for smart data aggregation or filtering, which is indispensable for the user activity on the Web. Effective bias is necessary for the constraints selection in order to make it a more practical technique, We approach this problem by incorporating human biasing using an easy manipulatable interactive tool. This tool has several functions such as the 2-D visual arrangement of a dataset and constraint assignment by mouse manipulation. Moreover, it can be used to execute distance metric learning and k-means clustering. In this paper, we show an overview of the tool and how it works, especially for the functions for display arrangement by using multi-dimensional scaling and incremental distance metric learning. In the experiments, we investigated the performance of the sampling heuristics found by observing the interaction between the users and our tool. The results show that the heuristic outperforms the random sampling method both in the two benchmark datasets from the UCI repository and a Web page dataset from the Open Directory Project.
References
[1] K. Wagstaff and S. Roger, “Constrained k-means clustering with background knowledge,” in In Proceedings of the 18th International Conference on Machine Learning, 2001, pp. 577–0584.
[2] D. Klein, S. D. Kamvar, and C. D. Manning, “From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering,” in In Proceedings of the 19th International Conference on Machine Learning, 2002, pp. 307–314.
[3] S. C. H. Hoi, R. Jin, and M. R. Lyu, “Learning nonparametric kernel matrices from pairwise constraints,” in ICML ’07: Proceedings of the 24th international conference on Machine learning. New York, NY, USA: ACM, 2007, pp. 361–368.
[4] Z. Li, J. Liu, and X. Tang, “Pairwise constraint propagation by semidefinite programming for semi-supervised classifi- cation,” in ICML ’08: Proceedings of the 25th international conference on Machine learning. New York, NY, USA: ACM, 2008, pp. 576–583.
[5] S. Basu, M. Bilenko, and R. Mooney, “A probabilistic framework for semi-supervised clustering,” in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2004, pp. 59–68.
[6] C. Carpineto, S. Osi’nski, G. Romano, and D. Weiss, “A survey of web clustering engines,” ACM Comput. Surv., vol. 41, no. 3, pp. 1–38, 2009.
doi:10.1145/1541880.1541884
[7] I. Borg and P. Groenen, Modern multidimensional scaling: Theory and applications. Springer Verlag, 1997.
[8] T. Kohonen, “The self-organizing map,” Proceedings of the IEEE, vol. 78, no. 9, pp. 1464–1480, 2002.
doi:10.1109/5.58325
[9] C. Bishop, M. Svens’en, and C. Williams, “GTM: The generative topographic mapping,” Neural computation, vol. 10, no. 1, pp. 215–234, 1998.
doi:10.1162/089976698300017953
[10] A. Asuncion and D. Newman, “UCI machine learning repository,” 2007.
[11] P. Jain, B. Kulis, I. S. Dhillon, and K. Grauman, “Online metric learning and fast similarity search,” in In Proceedings of Twenty-Second Annual Conference on Neural Information Processing Systems, 2008, pp. 761–768.
[12] D. Lewis and W. Gale, “A sequential algorithm for training text classifiers,” in Proceedings of the Seventeenth ACMSIGIR Conference, 1994, pp. 3–12.
[13] R. Castro, C. Kalish, R. Nowak, R. Qian, T. Rogers, and X. Zhu, “Human active learning,” Advances in Neural Information Processing Systems (NIPS), vol. 22, 2008.
[14] W. Wu, C. Yu, A. Doan, and W. Meng, “An interactive clustering-based approach to integrating source query interfaces on the deep web,” in Proceedings of the 2004 ACM SIGMOD international conference on Management of data, 2004, pp. 95–106.
[15] K. Bade and A. Nurnberger, “Personalized hierarchical clustering,” in Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence. Washington, DC, USA: IEEE Computer Society, 2006, pp. 181– 187.
[16] A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” ACM Comput. Surv., vol. 31, no. 3, pp. 264–323, 1999.
doi:10.1145/331499.331504
[17] S. Basu, I. Davidson, and K. Wagstaff, Eds., Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman & Hall, 2008.
[18] M. desJardins, J. MacGlashan, and J. Ferraioli, “Interactive visual clustering,” in IUI ’07: Proceedings of the 12th international conference on Intelligent user interfaces. New York, NY, USA: ACM, 2007, pp. 361–364.
[19] D. Ramage, P. Heymann, C. D. Manning, and H. Garcia- Molina, “Clustering the tagged web,” in WSDM ’09: Proceedings of the Second ACM International Conference on Web Search and Data Mining. New York, NY, USA: ACM, 2009, pp. 54–63.
[20] P. Ferragina and A. Gulli, “A personalized search engine based on web-snippet hierarchical clustering,” in WWW ’05: Special interest tracks and posters of the 14th international conference on World Wide Web. New York, NY, USA: ACM, 2005, pp. 801–810.
doi:10.1145/1062745.1062760
[21] X. Qi and B. D. Davison, “Web page classification: Features and algorithms,” ACM Comput. Surv., vol. 41, no. 2, pp. 1–31, 2009.
doi:10.1145/1459352.1459357
[22] F. Geraci, M. Pellegrini, P. Pisati, and F. Sebastiani, “A scalable algorithm for high-quality clustering of web snippets,” in SAC ’06: Proceedings of the 2006 ACM symposium on Applied computing. New York, NY, USA: ACM, 2006, pp. 1058–1062.
[23] H. Ding, J. Liu, and H. Lu, “Hierarchical clusteringbased navigation of image search results,” in MM ’08: Proceeding of the 16th ACM international conference on Multimedia. New York, NY, USA: ACM, 2008, pp. 741– 744.
[24] D. Cai, X. He, Z. Li, W.-Y. Ma, and J.-R. Wen, “Hierarchical clustering of www image search results using visual, textual and link information,” in MULTIMEDIA ’04: Proceedings of the 12th annual ACM international conference on Multimedia. New York, NY, USA: ACM, 2004, pp. 952–959.
[25] J. S. Whissell, C. L. Clarke, and A. Ashkan, “Clustering web queries,” in CIKM ’09: Proceeding of the 18th ACM conference on Information and knowledge management. New York, NY, USA: ACM, 2009, pp. 899–908.
[26] Z.-W. Li, X. Xie, H. Liu, X. Tang, M. Li, and W.-Y. Ma, “Intuitive and effective interfaces for www image search engines,” in MULTIMEDIA ’04: Proceedings of the 12th annual ACM international conference on Multimedia. New York, NY, USA: ACM, 2004, pp. 748–749.
[27] F. Jing, C. Wang, Y. Yao, K. Deng, L. Zhang, and W.- Y. Ma, “Igroup: web image search results clustering,” in MULTIMEDIA ’06: Proceedings of the 14th annual ACM international conference on Multimedia. New York, NY, USA: ACM, 2006, pp. 377–384.
[28] O. Gerstel, S. Kutten, E. S. Laber, R. Matichin, D. Peleg, A. A. Pessoa, and C. Souza, “Reducing human interactions in web directory searches,” ACM Transactions on Information Systems, vol. 25, no. 4, p. 20, 2007.
doi:10.1145/1281485.1281491
[29] O. Chapelle, B. Scholkopf, and A. Zien, Semi-Supervised Learning. The MIT Press, 2006.
[30] S. Shwartz, Y. Singer, and A. Y. Ng, “Online and batch learning of pseudo-metrics,” in In Proceedings of the 21st International Conference on Machine Learning, 2004, pp. 94–101.
[31] W. Tang, H. Xiong, S. Zhong, and J.Wu, “Enhancing semisupervised clustering: a feature projection perspective,” in KDD ’07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM, 2007, pp. 707–716.
[32] S. Osinski and D. Weiss, “Conceptual clustering using lingo algorithm: Evaluation on open directory project data,” in In Proceedings of Intelligent Information Processing and Web Mining, 2004, pp. 369–377.
[33] P. A. Chirita, W. Nejdl, R. Paiu, and C. Kohlsch¨utter, “Using odp metadata to personalize search,” in SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2005, pp. 178– 185.
Full Text: PDF


