Application of Rank Correlation, Clustering and Classification in Information Security
Abstract
Keywords
References
[1] R. Dazeley, J. Yearwood, B. Kang, and A. Kelarev, "Consensus clustering and supervised classification for profiling phishing emails in internet commerce security", in Knowledge Management and Acquisition for Smart Systems and Services, PKAW2010, Lecture Notes in Computer Science, vol. 6232, 2010, pp. 235-246.
http://dx.doi.org/10.1007/978-3-642-15037-1_20
[2] J. Yearwood, D. Webb, L. Ma, P. Vamplew, B. Ofoghi, and A. Kelarev, "Applying clustering and ensemble clustering approaches to phishing profiling". in Data Mining and Analytics 2009, Proc. 8th Australasian Data Mining Conference: AusDM 2009, CRPIT, vol. 101, 2009, pp. 25-34.
[3] APWG, "Anti-Phishing Working Group", http://apwg.org/, accessed 15 December 2011.
[4] OECD, "Organisation for Economic Cooperation and Development, OECD task force on spam, OECD anti-spam toolkit and its annexes", http://www.oecd.org/dataoecd/63/28/36494147.pdf, accessed 20 November 2011.
[5] PhishTank, "Developer information", http://www.phishtank.com/developer_info.php, viewed 20 September 2011.
[6] T. Joachims, "A probabilistic analysis of the rocchio algorithm with TF-IDF for text categorization", in Proc. 14th International Conference on Machine Learning, 1997, pp. 143-151.
[7] H. Liu and H. Motoda, Feature Extraction, Construction and Selection: A Data Mining Perspective. Dordrecht: Kluwer, 1988.
[8] NIST/SEMATECH, "E-handbook of statistical methods", http://www.itl.nist.gov/div898/handbook/, viewed 21 October 2011.
[9] A. Jain and R. Dubes, Algorithms for Clustering Data. Upper Saddle River, NJ, USA: Prentice-Hall, 1988.
[10] I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques. Amsterdam: Elsevier/Morgan Kaufman, 2005.
[11] A. Jain, M. Murty, and P. Flynn, "Data clustering: a review", ACM Computing Surveys, vol. 31, pp. 264-323, 1999.
http://dx.doi.org/10.1145/331499.331504
[12] D. Fisher, "Knowledge acquisition via incremental conceptual clustering", Machine Learning, vol. 2, pp. 139-172, 1987.
http://dx.doi.org/10.1007/BF00114265
[13] J. Gennari, P. Langley, and D. Fisher, "Models of incremental concept formation", Artificial Intelligence, vol. 40, pp. 11-61, 1990.
http://dx.doi.org/10.1016/0004-3702(89)90046-5
[14] S. Hochbaum, "A best possible heuristic for the k-center problem", Mathematics of Operations Research, vol. 10, pp. 180-184, 1985.
http://dx.doi.org/10.1287/moor.10.2.180
[15] P. Rousseeuw, "Silhouettes: a graphical aid to the interpretation and validation of cluster analysis", J. Comp. Appl. Math., vol. 20, pp. 53-65, 1987.
http://dx.doi.org/10.1016/0377-0427(87)90125-7
[16] P. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. Boston, MA, USA: Addison-Wesley, 2005.
[17] X. Fern and C. Brodley, "Solving cluster ensemble problems by bipartite graph partitioning", in 21st International Conference on Machine Learning, ICML'04, vol. 69. New York, NY, USA: ACM, 2004, pp. 36-43.
[18] A. Strehl and J. Ghosh, "Cluster ensembles - a knowledge reuse framework for combining multiple partitions", J. Machine Learning Research, vol. 3, pp. 583-617, 2002.
[19] A. Topchy, A. Jain, and W. Punch, "Combining multiple weak clusterings", in IEEE International Conference on Data Mining, 2003, pp. 331-338.
http://dx.doi.org/10.1109/ICDM.2003.1250937
[20] G.Karypis and V. Kumar, "Metis: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices", University of Minnesota, Department of Computer Science and Engineering, Army HPC Research Centre, Minneapolis, Technical Report, 1998.
[21] I. Guyon and A. Elisseeff, "An introduction to variable and feature selection", Journal of Machine Learning Research}, vol. 3, pp. 1157-1182, 2003.
[22] D. Sridhar, E. Bartlett, and R. Seagrave, "Information theoretic subset selection for neural network models", Computers & Chemical Engineering, vol. 22, pp. 613-626, 1998.
http://dx.doi.org/10.1016/S0098-1354(97)00227-5
[23] Y. Hong, S. Kwong, Y. Chang, and Q. Ren, "Consensus unsupervised feature ranking from multiple views", Pattern Recognition Letters, vol. 29, pp. 595-602, 2008.
http://dx.doi.org/10.1016/j.patrec.2007.11.012
[24] G. Corder and D. Foreman, Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach. New York: Wiley Interscience, 2009.
http://dx.doi.org/10.1002/9781118165881
[25] M. Kendall and J. Gibbons, Rank Correlation Methods, 5th ed. London: Oxford University Press, 1990.
[26] R. Bouckaert, E. Frank, M. Hall, R. Kirkby, P. Reutemann, A. Seewald, and D. Scuse, "Weka manual for version 3-7-3", http://www.cs.waikato.ac.nz/ml/weka/, viewed 15 August 2011.
[27] R. Kohavi, "The power of decision tables", in 8th European Conference on Machine Learning, 1995, pp. 174-189.
[28] D. Aha and D. Kibler, "Instance-based learning algorithms", Machine Learning, vol. 6, pp. 37-66, 1991.
http://dx.doi.org/10.1007/BF00153759
[29] R. Quinlan, C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann, 1993.
[30] W. Cohen, "Fast effective rule induction", in Proc. 12th Internat. Conf. Machine Learning, 1995, pp. 115-123.
[31] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, "Liblinear - a library for large linear classification", Software available at http://www.csie.ntu.edu.tw/~cjlin/liblinear/, viewed 10 August 2011.
[32] C.-C. Chang and C.-J. Lin, "Libsvm~-- a library for support vector machines", Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/, viewed 12 June 2011.
[33] R.-E. Fan, P.-H. Chen, and C.-J. Lin, "Working set selection using second order information for training svm", J. Machine Learning Research, vol. 6, pp. 1889-1918, 2005.
[34] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed. New York: Wiley, 2001.
[35] F. Frank and I. Witten, "Generating accurate rule sets without global optimization", in Proc. 15th Internat. Conf. on Machine Learning, 1998, pp. 144-151.
[36] T. Hastie and R. Tibshirani, "Classification by pairwise coupling", in Advances in Neural Information Processing Systems, 1998.
[37] S. Keerthi, S. Shevade, C. Bhattacharyya, and K. Murthy, "Improvements to Platt's SMO algorithm for SVM classifier design", Neural Computation, vol. 13, no. 3, pp. 637-649, 2001.
http://dx.doi.org/10.1162/089976601300014493
[38] J. Platt, "Fast training of support vector machines using sequential minimal optimization", in Advances in Kernel Methods - Support Vector Learning, 1998.
[39] G. Demiroz and A. Guvenir, "Classification by voting feature intervals", in Proc. 9th European Conference on Machine Learning, 1997, pp. 85-92.
[40] X. Wu, V. Kumar, J. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. McLachlan, A. Ng, B. Liu, P. Yu, Z. Zhou, M. Steinbach, D. Hand, and D. Steinberg, "Top 10 algorithms in data mining", Knowledge Inf. Systems, vol. 14, pp. 1-37, 2007.
http://dx.doi.org/10.1007/s10115-007-0114-2
[41] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, "A practical guide to support vector classification", Dept. Computer Science, National Taiwan University, http://www.csie.ntu.edu.tw/~cjlin, Initial version: 2003, last updated: April 15, 2010.
[42] G. Beliakov and J. Ugon, "Implementation of novel methods of global and non-smooth optimization: GANSO programming library", Optimization, vol. 56, pp. 543-546, 2007.
http://dx.doi.org/10.1080/02331930701617429
Full Text: PDF


