An Improved Algorithm of Bayesian Text Categorization
Abstract
Text categorization is a fundamental methodology of text mining and a hot topic of the research of data mining and web mining in recent years. It plays an important role in building traditional information retrieval, web indexing architecture, Web information retrieval, and so on. This paper presents an improved algorithm of text categorization that combines the feature weighting technique with Naïve Bayesian classifier. Experimental results show that using the improved Gini index algorithm to feature weight can improve the performance of Naïve Bayesian classifier effectively. This algorithm obtains good application in the sensitive information recognition system.
Keywords
References
F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys,2002, 34(1), pp. 1-47.
http://dx.doi.org/10.1145/505282.505283
J. Carbonell, W. W. Cohen, Y. Yang. Guest Editorial for the Special Issue on Text Categorization. Machine Learning, 2000.
S.Q. Gao. The Review of Research Status on Web Text Categorization. Book, Information and Knowledge, 2008, No.123, pp. 81-86.
X.L. Li. Concept Reasoning Network and its Application in Text Categorization. Computer Research and Development, 2000(9). B. Zhang. The Research of Chinese Text Categorization. Master Thesis of Wuhan University, 2004.
X.J. Huang, L.D. Wu. The Text Categorization Based On the Independent of the Language. International Conference on Multilingual Information Processing, 2000, pp. 37-43.
Chia-Hung Yeh, Kai-Jie Fan, Mei-Juan Chen, Gwo-Long Li. Fast Mode Decision Algorithm for Scalable Video Coding Using Bayesian Theorem Detection and Markov Process. IEEE Transaction on Circuits and Systems for Video Technology, April 2010, vol.20, 4, pp.563-574.
http://dx.doi.org/10.1109/TCSVT.2010.2041825
F. Li, Q.S. Liu.Naive Bayesian classifier model based on improved weighted attributes. Computer Engineering and Applications, 2010, 46(4), pp.132-133.
L. Breiman, J. Friedman, R. Olshen et al. Classification and Regression Trees. Monterey, CA: Wadsworth International Group, 1984. S. Shankar, G. Karypis. A Feature Weight Adjustment Algorithm for Document Categorization. http://www.cs.umm.edu/~karypis.
Charu C. Aggarwal, Stephen C. Gates, Philip S. Yu. On the Merits of Building Categorization Systems by Supervised Clustering. In KDD’99, San Diego, USA, 1999, pp. 352-356.
W.Q. Shang, H.K. Huang, Y.L. Liu, Y.M. Lin, Y.L. Qu, H.B. Dong. Research On the Algorithm of Feature Selection Based on Gini Index for Text Categorization. Computer Research and Development, 2006, 43(10), pp.1688-1694.
http://dx.doi.org/10.1360/crad20061002
H. Park, S. Kwon, H. Kwon. Complete Gini-Index Text (GIT) feature selection algorithm for text classification. 2010 2nd International Conference on Software Engineering and Data Mining (SEDM), 2010, pp.366-371.
S. K. Gupta, D. V. Somayajulu, J. K. Arora, B. Vasudha. Scalable Classifiers with Dynamic Pruning. In Proc. of the 9th International Workshop on Database and Expert Systems Applications. Los Alamitos, CA: IEEE Computer Society Washington, DC, USA, 1998.
H.L. Tang, J.T. Sun, Y.C. Lu. Text categorization and evaluation function in combination with TEF-WA weight adjustment technique. Computer Research and Development, 2005, 42(1), pp.47-53.
http://dx.doi.org/10.1360/crad20050106
P. Castells, M. Fernandez, D. Vallet. An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval. IEEE Transactions on Knowledge and Data Engineering, February 2007, vol.19, 2, pp.261-272.
http://dx.doi.org/10.1109/TKDE.2007.22
H.K. Mohamed. Automatic documents classification. International Conference on Computer Engineering & Systems, 2007, pp.33-37.
D.Y. Wang, J. Wang. Improved Feature Weighting Algorithm for Text Categorization. Computer Engineering, 2010, 36(9), pp.197-199.
Full Text: PDF


