Journal of Computers, Vol 5, No 1 (2010), 59-68, Jan 2010
doi:10.4304/jcp.5.1.59-68

Design and Analysis of an Effective Corpus for Evaluation of Bengali Text Compression Schemes

Md. Rafiqul Islam, S. A. Ahsan Rajon

Abstract


In this paper, we propose an effective platform for evaluation of Bengali text compression schemes. A novel scheme for construction of Bengali text compression corpus has also been incorporated in this paper. A methodical study on the formulation-approaches of text corpus for data compression and present an effective corpus named Ekushe-Khul for evaluating the Bengali text compression schemes has also been presented in this paper. To design the Bengali text compression corpus, Type to Token Ratio has been considered as the selection criteria with a number of secondary considerations. This paper also presents a mathematical analysis on data compression performance with structural aspects of corpora. A comprehensive analysis on the evolving criteria of text compression corpora with related issues in designing dictionary based compression are extensively incorporated here. The proposed corpus is effective for evaluating compression efficiency of small and middle sized Bengali text files.



Keywords


Corpus, Bengali Text, Bengali Text Compression, Dictionary Coding, Data Management, Evaluation Platform, Compression Efficiency, Type to Token Ratio (TTR)

References



Full Text: PDF


Journal of Computers (JCP, ISSN 1796-203X)

Copyright @ 2006-2011 by ACADEMY PUBLISHER – All rights reserved.