Journal of Computers, Vol 5, No 7 (2010), 995-1002, Jul 2010
doi:10.4304/jcp.5.7.995-1002

Utility Maximization Model for Deep Web Source Selection and Integration

Xuefeng Xian, Zhiming Cui, Pengpeng Zhao, Yuanfeng Yang, Guangming Zhang

Abstract


The World Wide Web is witnessing an increase in the amount of structured content--vast collection of structured data are on the rise due to the deep web. Such Internet-scale deep web data integration tasks are becoming increasingly more common. In Internet-scale deep web data integration tasks, a primary challenge is to determine in which web database to be included in the integration system. This paper presents a utility maximization model for resources selection of deep web data integration. This new model shows an efficient and effective way to estimate the approximate utility of the web database bringing to a given status of an integration system by integrating it. The utility of the web databases is synthesized by positive and negative utility. With the estimated utility information, web database selection can be made by explicitly optimizing the goal of high-utility(include as much and important data as possible in the selected databases, and the query cost of which as low as possible) in an iterative manner, where web databases are integrated incrementally. We experimentally demonstrate that our approach is efficient and finding high-utility data integration solutions.



Keywords


deep web; data integration; utility maximization model; web database selection

References



Full Text: PDF


Journal of Computers (JCP, ISSN 1796-203X)

Copyright @ 2006-2012 by ACADEMY PUBLISHER – All rights reserved.