Journal of Emerging Technologies in Web Intelligence, Vol 4, No 2 (2012), 181-188, May 2012
doi:10.4304/jetwi.4.2.181-188

Aggregated Search in XML Documents

Fatma Zohra Bessai-Mechmache, Zaia Alimazighi

Abstract


In this paper, we are interested in aggregated search in structured XML documents. We present a structured information retrieval model based on possibilistic networks. Relations terms-elements and elements-document are modeled through possibility and necessity. In this model, the user’s query starts a process of propagation to recover the elements. Thus, instead of retrieving a list of elements that are likely to answer partially the user’s query, our objective is to build a virtual elements that contain relevant and non-redundant elements, that are likely to answer better the query that elements taken separately. Indeed, the possibilistic network structure provides a natural representation of links between a document, its elements and its content, and allows an automatic selection of relevant and non-redundant elements. We evaluated our approach using a sub-collection of INEX (INitiative for the Evaluation of XML retrieval) and presented some results for evaluating the impact of the aggregation approach.



Keywords


XML Information Retrieval, possibilistic networks, aggregated search, redundancy

References


R. Agrawal, S. Gollapudi, A. Halverson, “Diversifying search results”, ACM Int. Conference on WSDM, 2009.

N. Ben Amor, “Qualitative possibilistic graphical models: from independence to propagation algorithms”, Thèse pour l’obtention du titre de Docteur en Gestion, université de Tunis, 2002.

S. Benferhat, D. Dubois, L. Garcia and H. Prade, “Possibilistic logic bases and possibilistic graphs”, In Proc. of the 15th Conference on Uncertainty in Artificial Intelligence, pp.57-64, 1999.

C. Borgelt, J. Gebhardt and R. Kruse, “Possibilistic graphical models”, Computational Intelligence in Data Mining, CISM Courses and Lectures 408, Springer, Wien, pp.51-68, 2000.

A. Brini, M. Boughanem and D. Dubois, “A model for information retrieval based on possibilistic networks”, SPIRE’05, Buenos Aires, LNCS, Springer Verlag, pp. 271-282, 2005.

C.L. Clarke, M. Kolla, G.V. Cormack and O. Vechtomova, “Novelty and diversity in information retrieval evaluation”, SIGIR’08, pp. 659-666, 2008.

D. Dubois and H. Prade, “Possibility theory”, Plenum, 1988.

D. M. Dunlavy, D. P. O’Leary, J. M. Conroy and J. D. Schlesinger, “QCS: A system for querying, clustering and summarizing documents”, Information Processing and Management, pp. 1588-1605, 2007.

N. Fuhr, M. Lalmas, S. Malik and Z. Szlavik, “Advances in XML information retrieval: INEX 2004”, Dagstuhl Castle, Germany, 2004.

Y. Huang, Z. Liu and Y. Chen, “Query biased snippet generation in XML search”, ACM SIGMOD, pp. 315-326, 2008.

B. J. Jansen and A. Spink, “An Analysis of document viewing pattern of web search engine user”, Web Mining: Applications and Techniques, pp. 339-354, 2005.

B. J. Jansen and A. Spink, “How are we searching the world wide web?: a comparison of nine search engine transaction logs”, Information Processing and Management, pp. 248-263, 2006.

J. Kamps, M. Marx, M. De Rijke and B. Sigurbjörnsson, “XML retrieval: What to retrieve?”, ACM SIGIR Conference on Research and Development in Information Retrieval, pp.409-410, 2003.

A. Kopliku, F. Damak, K. Pinel-Sauvagnat and M. Boughanem, “A user study to evaluate aggregated search”, In IEEE/WIC/ACM International Conference on Web Intelligence, in press.

M. Lalmas, “Dempster-Shafer’s theory of evidence applied to structured documents: modelling uncertainty”, In Proceedings of the 20th Annual International ACM SIGIR, pp.110–118, Philadelphia, PA, USA. ACM, 1997.

M. Lalmas and P. Vannoorenberghe, “Indexation et recherche de documents XML par les fonctions de croyance”, CORIA'2004, pp. 143-160, 2004.

P. Ogilvie and J. Callan, “Using language models for flat text queries in XML retrieval”, In Proceedings of INEX 2003 Workshop, Dagstuhl, Germany, pp.12–18, December 2003.

B. Piwowarski, G.E. Faure and P. Gallinari, “Bayesian networks and INEX”, In INEX 2002 Workshop Proceedings, pp. 149-153, Germany, 2002.

N. Polyzotis and M. N. Garofalakis, “XCluster synopses for structured XML content”, ICDE, pp. 63, 2006.

D. Radev, J. Otterbacher, A. Winkel and S. Blair-Goldensohn, “NewsInEssence: summarizing online news topics”, Communications of the Association of Computing Machinery, pp. 95-98, 2005.

T. Rölleke, M. Lalmas, G. Kazai, I. Ruthven and S. Quicker, “The accessibility dimension for structured document retrieval”, BCS-IRSG European Conference on Information Retrieval (ECIR), Glasgow, Mars 2002.

K. Sauvagnat, “Modèle flexible pour la recherche d’information dans des corpus de documents semi-structurés”, Thèse de Doctorat de l’Université Paul Sabatier, Juillet 2005.

B. Sigurbjornsson, J. Kamps and M. de Rijke, “An element-based approach to XML retrieval”, INEX 2003 workshop, Dagstuhl, Germany, December 2003.

A. Spink, B. J. Jansen, D. Wolfram and T. Saracevic, “From e-sex to e-commerce: web search changes”, IEEE Computer Science, vol. 35, pp. 107-109, 2002.

S. Sushmita and M. Lalmas, “Using digest pages to increase user result space: preliminary designs”, Special Interest Group on Information Retrieval 2008 Workshop on Aggregated Search, 2008.

L. A. Zadeh, “Fuzzy sets as a basis for a theory of possibility”, In Fuzzy Sets and Systems, 1:3-28, 1978.


Full Text: PDF


Journal of Emerging Technologies in Web Intelligence (JETWI, ISSN 1798-0461)

Copyright @ 2006-2013 by ACADEMY PUBLISHER – All rights reserved.