Journal of Advances in Information Technology, Vol 3, No 2 (2012), 107-114, May 2012
doi:10.4304/jait.3.2.107-114

Feature Optimization and Performance Evaluation of Machine Learning Algorithms for Identification of P2P Traffic

Sunil Agrawal, Balwinder S. Sohi

Abstract


P2P applications supposedly constitute a substantial proportion of today's Internet traffic. The ability to accurately identify different P2P applications in internet traffic is important to a broad range of network operations including application-specific traffic engineering, capacity planning, resource provisioning, service differentiation, etc. However, current P2P applications use several obfuscation techniques, including dynamic port numbers, port hopping, and encrypted payloads. As P2P applications continue to evolve, robust and effective methods are needed for identification of P2P applications. It is general practice to reduce the cost of classification by reducing the number of features, utilizing some feature selection algorithm. But such algorithms are highly data-dependent and do not yield good result when tried upon other data set. In this paper, we propose an optimized set of features and compare five supervised ML algorithms for identification of the P2P traffic. It is found that NBTree outperforms other ML algorithms with 96.6% precision and 99.7% recall, when they are trained and tested on the same data set. As far as training time is concerned, BayesNet is the best with precision and recall very close to that of NBTree.



Keywords


Flow features, Feature selection, Machine learning (ML) algorithms, Traffic classification

References


S. Sen, O. Spatscheck, and D. Wang, “Accurate, Scalable In-Network Identification of P2P Traffic using Application Signatures,” Proceedings of the 13th International World Wide Web Conference, pp. 512- 521, NY, USA, May 2004.

T. Karagiannis, A. Broido, M. Faloutsos, and K. Klaffy, “Transport Layer Identification of P2P Traffic,” Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement (IMC 2004), pp. 121-134, Italy, October 2004.

Y. Hu, D. Chiu, and J.C.S. Lui, “Profiling and Identification of P2P Traffic,” Computer Networks, Volume 53, issue 6, pp. 849-863, April 2009.

W. Li, M. Canini, A.W. Moore and R. Bolla, “Efficient Application Identification and the Temporal and Spatial Stability of Classification Schema”, Computer Networks, Volume 53, issue 6, pp. 790-809, April 2009.

B. Yu, Z. Xu, “A Comparative study for Content-based dynamic spam classification using four machine learning algorithms”, Knowledge-based Systems, vol. 21, pp. 355-362, 2008.

A. Moore, D. Zuev, “Discriminators for use in flowbased classification”, Technical report, Intel Research, Cambridge (2005).

A. W. Moore and D. Papagiannaki, “Toward the accurate identification of network applications,” Proc. 6th Passive Active Meas. Workshop (PAM), vol. 3431, pp. 41–54, Mar. 2005.

A. Appice, M. Ceci, S. Rawles, and P. Flach, “Redundant feature elimination for multi-class problems”, Proceedings of the 21st International Conference on Machine Learning, Canada, July 2004.

N. Williams, S. Zander, and G. Armitage, “Evaluating machine learning algorithms for automated network application identification”, Technical Report 060401B, CAIA, Swinburne Univ., April 2006.

N. Williams, S. Zander, and G. Armitage, “A preliminary performance comparison of five machine learning algorithms for practical ip traffic flow classification”, ACM SIGCOMM CCR, 36(5):7–15, October 2006.

A. Blum and P. Langley, “Selection of relevant features and examples in machine learning”, Artificial Intelligence, 97(1-2):245–271, 1997.

WEKA: Data Mining Software in Java. http://www.cs.waikato.ac.nz/ml/weka/.

R. Kohavi and J. R. Quinlan, Will Klosgen and Jan M. Zytkow, editors, “Decision-tree discovery”, In Handbook of Data Mining and Knowledge Discovery, chapter 16.1.3, pages 267-276, Oxford University Press, 2002.

Ron Kohavi, “Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid”, Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD), pp. 202-207, August 1996.

Simon Haykin, “Neural Networks”, Pearson Education, 2006.


Full Text: PDF


Journal of Advances in Information Technology (JAIT, ISSN 1798-2340)

Copyright @ 2006-2013 by ACADEMY PUBLISHER – All rights reserved.