Classifying Documents with Maximum Likelihood Approximation of the Dirichlet Multinomial Gibbs Model
Abstract
In the text analysis, the Dirichlet compound multinomial (DCM)distribution has recently been shown to be a good model for documents because it captures the phenomenon of word burstiness, unlike the standard multinomial distribution. The burstiness phenomenon describes the behavior of a rare word appearing many times in a single document. In this paper, for the sake of improving performance of modeling documents, we propose a variant of DCM and Gibbs distribution called Dirichlet multinomial Gibbs (DMG) model by introducing Gibbs parameters to DCM distribution. We demonstrate the maximum likelihood procedure of the DMG model with these Gibbs parameters. By our experiments, the DMG approach inherit the merits of methods of Gibbs distribution approximation and DCM estimation. More specifically, as revealed by our experimental results on various real-world text datasets, we show that maximum likelihood approximation of the DMG model is more desirable than some current state-of-the-art methods.
Keywords
References
Full Text: PDF


