Integrated Query of the Hidden Web

Berman, S and Kamkuemah, M and Muntunemuine, J (2009) Integrated Query of the Hidden Web, Proceedings of 11th Annual Conference on World Wide Web Applications, 2 - 4 September 2009, Port Elizabeth, South Africa, 4-15.

Full text not available from this repository. (Use alternate locations listed below)


There is a need for software that can access multiple Websites through a single, common interface. This would allow users, for example, to compare flights for a particular trip across all relevant airline sites by posing a single query. This paper investigates automating this process in the case of airline databases hidden behind the Web (the so-called Deep Web or Hidden Web). We first constructed a prototype for integrated query of a handful of pre-determined airline sites. This proved useful in detecting commonalities and differences in the sites, and in selecting the most suitable technologies for working with multiple forms. A generic system was then designed and components of the prototype incrementally replaced by domain-specific tools able to handle arbitrary airline sites. Our results were promising as regards result interpretation, with 89% of response pages successfully handled. However query formulation presented many problems, with only 39% of query interfaces automatically interpreted correctly, and even fewer amenable to automated query propagation. We conclude that integrated access to the Hidden Web is considerably more challenging than crawling the Surface Web, and that domain-specific systems are a promising approach to full automation.

Item Type: Conference paper
Additional Information: ISBN: 978-0-620-45215-1
Subjects: Information systems > Information retrieval
Date Deposited: 06 Dec 2009
Last Modified: 10 Oct 2019 15:34

Actions (login required)

View Item View Item