NLAPOST2021 1st Shared Task on Part-of-Speech Tagging for Nguni Languages

Pannach, Franziska and Meyer, Francois and Jembere, Edgar and Sibon, Dlamini (2021) NLAPOST2021 1st Shared Task on Part-of-Speech Tagging for Nguni Languages, Proceedings of International Conference of the Digital Humanities Association of Southern Africa (DHASA) 2021.

[thumbnail of DHASA2021_Pannach_NLAPOST2021 (1).pdf] Text
DHASA2021_Pannach_NLAPOST2021 (1).pdf

Download (365kB)

Abstract

Part-of-speech tagging (POS tagging) is a process of assigning labels to each word in text, to indicate its lexical category based on the context it appears in. The POS tagging problem is considered a mostly solved problem in languages with a lot of NLP resources such as English. However, this problem is still an open problem for languages with fewer NLP resources such as the Nguni languages. This is owing to unavailability of large amounts of labelled data to train POS tagging models. The rich morphological structure and the agglutinative nature of these languages make the POS tagging problem more challenging when compared to a language like English. With this in mind, we have organised a challenge for training POS tagging models on a limited amount of data for four Nguni languages: isiZulu, Siswati, isiNdebele, and isiXhosa.

Item Type: Conference paper
Subjects: Computing methodologies > Artificial intelligence > Natural language processing
Date Deposited: 09 Nov 2023 09:47
Last Modified: 09 Nov 2023 09:47
URI: https://pubs.cs.uct.ac.za/id/eprint/1601

Actions (login required)

View Item View Item