A Management System for Integrated Querying of Web Databases

Berman, S. and Rouse, C. (2008) A Management System for Integrated Querying of Web Databases, Proceedings of 10th Annual Conference of WWW Applications, 2nd - 5th September 2008, University of Cape Town.

Full text not available from this repository. (Use alternate locations listed below)

Abstract

The World Wide Web has revolutionised information accessibility, but about 500 times larger than the Surface Web is the data which resides on databases connected to the Web, the so-called Deep Web or Hidden Web. While these databases can be accessed individually via query forms on their Web pages, we need architectures and tools that make it possible to query all relevant databases on the World Wide Web. This paper proposes such an architecture, based on a superpeer topology, and a set of intelligent tools to facilitate a more automated access to Web databases. While complete automation is unlikely in the near future, a suitable framework and toolkit is necessary to maximise automation and reusability, and to minimise the effort required by the human user. Our system is based on a peer-to-peer framework, and tools that exploit this topology in analysing, configuring and tuning components for a particular application domain, such as the travel and real estate domains. We describe a prototype implementation and evaluate its usability, performance and accuracy in the context of an initial experiment involving 32 independently-constructed databases. Components for automated query processing, schema matching, schema translation and data transformation are presented. The prototype system uses a mediated schema approach to integrated querying of multiple databases, and includes a query interface which allows results from different databases to be viewed together or in separate tabs as they are received from each data source. Our findings highlight the benefits of a semi-automated approach to integrated querying of the Deep Web.

Item Type: Conference paper
Subjects: Information systems > Data management systems
Date Deposited: 24 Nov 2008
Last Modified: 10 Oct 2019 15:34
URI: http://pubs.cs.uct.ac.za/id/eprint/503

Actions (login required)

View Item View Item