Journal of Software, Vol 6, No 7 (2011), 1368-1373, Jul 2011
doi:10.4304/jsw.6.7.1368-1373

An Empirical Study on Class Probability Estimates in Decision Tree Learning

Liangxiao Jiang, Chaoqun Li

Abstract


Decision tree is one of the most effective and widely used models for classification and ranking and has received a great deal of attention from researchers in the domain of data mining and machine learning. A critical problem in decision tree learning is how to estimate the class-membership probabilities from decision trees. In this paper, we firstly survey all kinds of class probability estimation methods, mainly include the maximum-likelihood estimate, the Laplace estimate, the m-estimate, the similarity-weighted estimate, the naive Bayes-based estimate, and so on. Then, we provide an empirical study on the classification and ranking performance of the resulting decision trees using different class probability estimation methods. The experimental results based on a large number of UCI data sets verify our conclusions.



Keywords


decision tree learning; probability estimation tree; class probability estimation; classification; ranking.

References


[1] T. M. Mitchell, Machine Learning, 1st ed. McGraw-Hill, 1997.

[2] F. Provost and P. Domingos, “Tree induction for probability-based ranking,” Machine Learning, vol. 52, pp. 199–215, 2003.
http://dx.doi.org/10.1023/A:1024099825458

[3] W. D. Jiang, L. and Z. Cai, “Scaling up the accuracy of bayesian network classifiers by m-estimate,” ser. Proceedings of the 3rd International Conference on Intelligent Computing. Springer, 2007, pp. 475–484.

[4] L. C. Jiang, L. and Z. Cai, “Learning decision tree for ranking,” Knowledge and Information Systems, vol. 20, pp. 123–135, 2009.
http://dx.doi.org/10.1007/s10115-008-0173-z

[5] G. A. G. E. Smyth, P. and U. M. Fayyad, “Retrofitting decision tree classifiers using kernel density estimation,” ser. Proceedings of the 12th International Conference on Machine Learning. Morgan Kaufmann, 1995, pp. 506– 514.

[6] C. Ling and R. Yan, “Decision tree with better ranking,” ser. Proceedings of the Twentieth International Conference on Machine Learning. AAAI, 2003, pp. 480–487.

[7] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, pp. 123–140, 1996.
http://dx.doi.org/10.1007/BF00058655

[8] E. Bauer and R. Kohavi, “An empirical comparison of voting classification algorithms: Bagging, boosting and variants,” Machine Learning, vol. 36, pp. 105–142, 1999.
http://dx.doi.org/10.1023/A:1007515423169

[9] I. H. Witten and E. Frank, Data Mining: Practical machine learning tools and techniques, 2nd ed. San Francisco: Morgan Kaufmann, 2005.

[10] J. R. Quinlan, C4.5: Programs for Machine Learning, 1st ed. San Mateo, CA: Morgan Kaufmann, 1993.

[11] A. P. Bradley, “The use of the area under the roc curve in the evaluation of machine learning algorithms,” Pattern Recognition, vol. 30, pp. 1145–1159, 1997.
http://dx.doi.org/10.1016/S0031-3203(96)00142-2

[12] D. J. Hand and R. J. Till, “A simple generalisation of the area under the roc curve for multiple class classification problems,” Machine Learning, vol. 45, pp. 171–186, 2001.
http://dx.doi.org/10.1023/A:1010920819831

[13] H. J. Ling, C. X. and H. Zhang, “Auc: a statistically consistent and more discriminating measure than accuracy,” ser. Proceedings of the International Joint Conference on Artificial Intelligence. Morgan Kaufmann, 2003, pp. 329– 341.

[14] C. Nadeau and Y. Bengio, “Inference for the generalization error,” Machine Learning, vol. 52, pp. 239–281, 2003.
http://dx.doi.org/10.1023/A:1024068626366

[15] R. E. S. Y. Freund, “Experiments with a new boosting algorithm,” ser. Proceedings of the 13th International Conference on Machine Learning. Morgan Kaufmann, 1996, pp. 148–156.

[16] L. Breiman, “Random forests,” Machine Learning, vol. 45, pp. 5–32, 2001.
http://dx.doi.org/10.1023/A:1010933404324


Full Text: PDF


Journal of Software (JSW, ISSN 1796-217X)

Copyright @ 2006-2012 by ACADEMY PUBLISHER – All rights reserved.