IsiZulu noun classification based on replicating the ensemble approach for Runyankore

Mahlaza, Zola and Sayed, Imaan and van der Leek, Alexander and Keet, C. Maria (2025) IsiZulu noun classification based on replicating the ensemble approach for Runyankore, Proceedings of First Workshop on Language Models for Low-Resource Languages (LoResLM 2025), 20 January 2025, Abu Dhabi, UAE, 469-478, Association for Computational Linguistics.

[thumbnail of LoResLM2025.pdf] Text
LoResLM2025.pdf

Download (217kB)

Abstract

A noun’s class is a crucial component in NLP, because it governs agreement across the sentence in Niger Congo B (NCB) languages, among others. There is a lack of computational models for determining a noun’s class owing to ill-documentation in most NCB languages. A promising approach by Byamugisha (2022) used a data-driven approach for Runyankore that combined syntax and semantics. The code and data are inaccessible however, and it remains to be seen whether it is suitable for other NCB languages. We solve the problem by reproducing Byamugisha’s experiment, but then for isiZulu. We conducted this as two independent experiments, so that we also could subject it to a meta-analysis. Results showed that it was reproducible only in part, mainly due to imprecision in the original description, and the current impossibility to generate the same kind of source data set generated from an existing grammar. The different choices made in attempting to reproduce the pipeline as well as differences in choice of training and test data had a large effect on the eventual accuracy of noun class disambiguation but could produce an accuracy of 83%, in the same range as Runyankore.

Item Type: Conference paper
Subjects: Computing methodologies > Artificial intelligence > Natural language processing
Computing methodologies > Machine learning
Date Deposited: 13 Oct 2025 12:25
Last Modified: 13 Oct 2025 12:25
URI: https://pubs.cs.uct.ac.za/id/eprint/1755

Actions (login required)

View Item View Item