UCT CS Research Document Archive

BantuWeb: A Digital Library for Resource Scarce South African Languages

von Holy, Andreas, Alon Bresler, Osher Shuman, Catherine Chavula and Hussein Suleman (2017) BantuWeb: A Digital Library for Resource Scarce South African Languages. In Proceedings Annual Conference of the South African Institute of Computer Scientists and Information Technologists (SAICSIT 2017), Thaba 'Nchu, South Africa.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.


South Africa is a linguistically diverse country: it is a home to 11 official languages of which nine, excluding English and Afrikaans, are Resource Scarce Languages (RSLs). Accordingly, many South Africans struggle to access information written in their native languages on the Web. Unfortunately, lack of access to information hinders social economic growth. This paper proposes a Web based digital library to act as a central repository for content written in these languages that is crawled from the Web, and generated or contributed by a community of users. Gamification features have been incorporated into the digital library to motivate users to contribute content to strengthen the collection of resources and to increase community participation. Specifically, the paper: (i) proposes a ranking algorithm, smart interleaving, to aggregate and rank multilingual search results effectively from collections of varying size; and (ii) investigates which gamification features, among leaderboard, notifications, virtual points and level, motivate users to contribute content in the context of South African RSLs. The results show that users were motivated to contribute more content to reach the next level than improving their leaderboard ranking or virtual points. Further, the overall results on merging and ranking multilingual search results show no significant improvement in using smart interleaving.

EPrint Type:Conference Paper
Keywords:Digital Libraries, Gamification, Crowdsourcing, Multilingual Information Retrieval, Search Engines, Information Retrieval Evaluation, Web Crawling, Language Preservation
ID Code:1226
Deposited By:Suleman, Hussein
Deposited On:25 November 2017
Alternative Locations:https://dl.acm.org/citation.cfm?id=3129446