Journal of Computers, Vol 5, No 4 (2010), 654-661, Apr 2010
doi:10.4304/jcp.5.4.654-661

Classifying Documents with Maximum Likelihood Approximation of the Dirichlet Multinomial Gibbs Model

Shibin Zhou, Zhao Cao, Yushu Liu

Abstract


In the text analysis, the Dirichlet compound multinomial (DCM)distribution has recently been shown to be a good model for documents because it captures the phenomenon of word burstiness, unlike the standard multinomial distribution. The burstiness phenomenon describes the behavior of a rare word appearing many times in a single document. In this paper, for the sake of improving performance of modeling documents, we propose a variant of DCM and Gibbs distribution called Dirichlet multinomial Gibbs (DMG) model by introducing Gibbs parameters to DCM distribution. We demonstrate the maximum likelihood procedure of the DMG model with these Gibbs parameters. By our experiments, the DMG approach inherit the merits of methods of Gibbs distribution approximation and DCM estimation. More specifically, as revealed by our experimental results on various real-world text datasets, we show that maximum likelihood approximation of the DMG model is more desirable than some current state-of-the-art methods.



Keywords


Document classification, Dirichlet compound multinomial model, Gibbs distribution

References



Full Text: PDF


Journal of Computers (JCP, ISSN 1796-203X)

Copyright @ 2006-2012 by ACADEMY PUBLISHER – All rights reserved.