Journal of Networks, Vol 7, No 6 (2012), 946-955, Jun 2012
doi:10.4304/jnw.7.6.946-955

A Comparison of the Classification of Disparate Malware Collected in Different Time Periods

Rafiqul Islam, Ronghua Tian, Veelasha Moonsamy, Lynn M Batten

Abstract


It has been argued that an anti-virus strategy based on malware collected at a certain date, will not work at a later date because malware evolves rapidly and an anti-virus engine is then faced with a completely new type of executable not as amenable to detection as the first was.

In this paper, we test this idea by collecting two sets of malware, the first from 2002 to 2007, the second from 2009 to 2010 to determine how well the anti-virus strategy we developed based on the earlier set [18] will do on the later set. This anti-virus strategy integrates dynamic and static features extracted from the executables to classify malware by distinguishing between families. We also perform another test, to investigate the same idea whereby we accumulate all the malware executables in the old and new dataset, separately, and apply a malware versus cleanware classification.

The resulting classification accuracies are very close for both datasets, with a difference of approximately 5.4% for both experiments, the older malware being more accurately classified than the newer malware. This leads us to conjecture that current anti-virus strategies can indeed be modified to deal effectively with new malware.



Keywords


malware, classification, static, dynamic.

References


 

[1] Bailey, M., Oberheide, J., Andersen, J., Mao, Z., Jahanian, F. and Nazario, J. (2007) Automated Classification and Analysis of Internet Malware, Chapter in Recent Advances in Intrusion Detection, LNCS 4637, 178-197, 2007.
http://dx.doi.org/10.1007/978-3-540-74320-0_10

[2] Barford, P., Yagneswaran, V. (2007) An inside look at botnets. Advances in Information Security, Springer, Heidelberg, 171-191.

[3] Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P. and Fienberg, S., (2003) Adaptive name matching in information integration, Intelligent Systems, IEEE 18(5), 16 - 23.
http://dx.doi.org/10.1109/MIS.2003.1234765

[4] Fossi, M., Mack, T., Mazurek, D., Egan, G., Adams, T., McKinney, D., Haley, K., Blackbird, J., Wood, P., Johnson, E. and Low, M. (2011), 'Symantec Internet security threat report: Trends for 2010', Volume 16, 20 pages.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. and Witten, I. (2009);

The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1.

[6] Islam, R., Tian, R., Moonsamy, V., Batten, L., Versteeg, S.: A cumulative timeline approach to malware detection. Submitted.

[7] Kohavi, R. (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In IJCAI, 1137–1145

[8] Lee, J.; Im, C. and Jeong, H. (2011), A study of malware detection and classification by comparing extracted strings, in 'Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication', ACM, New York, NY, USA, 75:1-75:4.

[9] Nair, V., Jain, H., Golecha, Y., Gaur, M. and Laxmi, V. (2010), MEDUSA: Metamorphic malware dynamic analysis using signature from API, in 'Proceedings of the 3rd international conference on Security of information and networks', ACM, 263-269.

[10] Rajab, M., Ballard, L., Jagpal, N., Mavrommatis, P., Nojiri, D., Provos, N. and Schmidt, L. (2011) Trends in Circumventing Web-Malware Detection. Google Technical Report.

[11] Rosayid, N., OhruiI, M., Kikuchi, H., Sooraksa, P. and Terada, M. (2010) A Discovery of Sequential Attack Patterns of Malware in Botnets, Information Processing Society of Japan, SMC 2010, 2564-2570.

[12] Roundy, K. and Miller, B. (2010), Hybrid Analysis and Control of Malware, in 'Recent Advances in Intrusion Detection', Springer Berlin - Heidelberg, 317-338.
http://dx.doi.org/10.1007/978-3-642-15512-3_17

[13] Sukwong, O., Kim, H. and Hoe, J. (2010) An Empirical Study of Commercial Antivirus Software Effectiveness, Computer 44 (3), 63-70.
http://dx.doi.org/10.1109/MC.2010.187

[14] Tang, H., Zhu, B. and Ren, K. (2009), A New Approach to Malware Detection, in Jong Park; Hsiao-Hwa Chen; Mohammed Atiquzzaman; Changhoon Lee; Tai-hoon Kim & Sang-Soo Yeo, ed., 'Advances in Information Security and Assurance', Springer Berlin / Heidelberg, 229-238.

[15] Tian, R., Batten, L., and Versteeg, S. (2008) Function length as a tool for malware classification. In Proceedings of the 3rd International Conference on Malicious and Unwanted Software: MALWARE 2008, 69–76.

[16] Tian, R., Islam, R., Batten, L., and Versteeg, S. (2010) Differentiating malware from cleanware using behavioural analysis. In Proceedings of the 5th International Conference on Malicious and Unwanted Software: MALWARE 2010, 23-30.

[17] Witten, I., Frank, E., Trigg, L., Hall, M., Holmes, G. and Cunningham, S. (1999) Weka: Practical machine learning tools and techniques with Java implementations, in Computer Science Working Papers, 10289/1040, University of Waikato, 192-196.

[18] Yagi, T., Tanimoto, N., Hariu, T. and Itoh, M. (2010) Investigation and analysis of malware on websites, in 'Web Systems Evolution’ (WSE), 2010, IEEE, 73 -81.

[19] You, I. and Yim, K. (2010) Malware Obfuscation Techniques: A Brief Survey, in 'Broadband, Wireless Computing, Communication and Applications’ (BWCCA), 297 -300.

[20] Zheng, X. and Fang, Y. (2010), An AIS-based cloud security model, in 'Intelligent Control and Information Processing (ICICIP), 153 -158
http://dx.doi.org/10.1109/ICICIP.2010.5564193

[21] Cyveillance, accessed on 19th May 2011, http://www.cyveillance.com/web/news/press_rel/2010/2010-08-04.asp


Full Text: PDF


Journal of Networks (JNW, ISSN 1796-2056)

Copyright @ 2006-2013 by ACADEMY PUBLISHER – All rights reserved.