Journal of Emerging Technologies in Web Intelligence, Vol 2, No 3 (2010), 160-166, Aug 2010
doi:10.4304/jetwi.2.3.160-166

A Novel Class-Based Data Fusion Technique for Information Retrieval

Muath Alzghool, Diana Inkpen

Abstract


Data fusion in information retrieval combines the results from multiple retrieval models or document representations. The achievement of data fusion technique is dependent on the quality of the inputs; classical data fusion techniques fail to improve the retrieval if the quality of the retrieval results varies from low to high quality. In order to tackle this problem, in this paper we address the issue of high variation among the retrieval strategies or document representations which affect the combination of their outputs. Our investigation on the MALACH speech collection – in which different segment representations are available – shows that neither the classical data fusion (CombSUM) nor the weighted version (WCombSum) improve the retrieval. We propose a novel class-based data fusion technique to deal with this issue. The segments retrieved by models based on different document representations are classified according to the quality of the segment into three classes: high, intermediate, and low quality class; then the similarity scores of each segment are fused using the classical CombSUM. Our experimental results show that the new technique is significantly better than CombSUM or WCombSUM in combing results with high quality variation.



Keywords


Information storage and retrieval, searching spontaneous speech transcriptions, data fusion

References



Full Text: PDF


Journal of Emerging Technologies in Web Intelligence (JETWI, ISSN 1798-0461)

Copyright @ 2006-2013 by ACADEMY PUBLISHER – All rights reserved.