Chavula, Catherine and Suleman, Hussein (2021) Ranking by Language Similarity for Resource Scarce Southern Bantu Languages, Proceedings of 2021 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR '21), 11 July 2021, Virtual Event, ACM.
Text
ictir077-chavulaA.pdf - Accepted Version Download (322kB) |
Abstract
Resource Scarce Languages (RSLs) lack sufficient resources to use Cross-Lingual Information Retrieval (CLIR) techniques and tools such as machine translation. Consequentially, searching using RSLs is frustrating and usually ends in unsuccessful struggling search. In such search tasks, search engines return low-quality results; relevant documents are either limited and lowly ranked or non-existent. Previous work has shown that alternative relevant results written in similar languages, including dialects, neighbouring and genetically related languages, can assist multilingual RSLs speakers to complete their search tasks. To improve the quality of search results in this context, we propose the re-ranking of documents based on the similarity between the language of the document and the language of the query. Accordingly, we created a dataset of four Southern Bantu languages that includes documents, topics, topical relevance and intelligibility features, and document utility annotations. To understand the intelligibility dimension of the studied languages, we conducted online intelligibility test experiments and used the data for feature selection and intelligibility prediction. We performed re-ranking of search results using offline evaluation, exploring Learning To Rank (LTR). Our results show that integrating topical relevance and intelligibility in ranking slightly improves retrieval effectiveness. Further, results on intelligibility prediction show that classification of intelligibility is feasible at a fair accuracy.
Item Type: | Conference paper |
---|---|
Uncontrolled Keywords: | Multilingual Information Retrieval Retrieval Models and Ranking |
Subjects: | Information systems > Information retrieval > Specialized information retrieval > Structure and multilingual text search > Multilingual and cross-lingual retrieval |
Alternate Locations: | https://doi.org/10.1145/3471158.3472251 |
Date Deposited: | 03 Dec 2021 11:38 |
Last Modified: | 03 Dec 2021 11:38 |
URI: | https://pubs.cs.uct.ac.za/id/eprint/1510 |
Actions (login required)
View Item |