A Novel Class-Based Data Fusion Technique for Information Retrieval
Abstract
Data fusion in information retrieval combines the results from multiple retrieval models or document representations. The achievement of data fusion technique is dependent on the quality of the inputs; classical data fusion techniques fail to improve the retrieval if the quality of the retrieval results varies from low to high quality. In order to tackle this problem, in this paper we address the issue of high variation among the retrieval strategies or document representations which affect the combination of their outputs. Our investigation on the MALACH speech collection – in which different segment representations are available – shows that neither the classical data fusion (CombSUM) nor the weighted version (WCombSum) improve the retrieval. We propose a novel class-based data fusion technique to deal with this issue. The segments retrieved by models based on different document representations are classified according to the quality of the segment into three classes: high, intermediate, and low quality class; then the similarity scores of each segment are fused using the classical CombSUM. Our experimental results show that the new technique is significantly better than CombSUM or WCombSUM in combing results with high quality variation.
Keywords
References
Full Text: PDF


