Indexing and Weighting of Multilingual and Mixed Documents

Ali, Mohammed Mustafa and Osman, Izzedin and Suleman, Hussein (2011) Indexing and Weighting of Multilingual and Mixed Documents, Proceedings of SAICSIT 2011, 3-5 October 2011, Cape Town, South Africa, 161-170, ACM.

[img] PDF
35_Final_Paper_source_files.pdf

Download (693kB)

Abstract

Non-English-speaking users, such as Arabic speakers, are not always able to express terminology in their native languages, especially in scientific domains. Such difficulty forces many Arabic authors and scholars to use English terms in order to explain precise concepts, particularly when they address technical topics, resulting in mixed/multilingual queries with both English and Arabic terms. Cross Language Information Retrieval (CLIR) allows users to search documents that are written in a language different from the query. However, current algorithms are optimized for monolingual queries, even if they are translated. This paper attempts to address the problem of multilingual querying in CLIR. New techniques that are better suited to the unique characteristics of this problem, in terms of indexing and weighting, are proposed. A new multilingual and mixed test collection containing mixed-language (Arabic and English) computer science documents and mixed-language queries has been created. Experimental results show that current CLIR techniques were not designed for these types of multilingual queries and documents and are found to perform poorly whereas the proposed techniques are found to be promising.

Item Type: Conference paper
Subjects: Information systems
Information systems > Information retrieval
Alternate Locations: http://dx.doi.org/10.1145/2072221.2072240
Date Deposited: 12 Dec 2011
Last Modified: 10 Oct 2019 15:33
URI: http://pubs.cs.uct.ac.za/id/eprint/744

Actions (login required)

View Item View Item