Automatically Generating IsiZulu Words From Indo-Arabic Numerals

Mahlaza, Z and Magwenzi, T and Keet, C.M. and Khumalo, L (2024) Automatically Generating IsiZulu Words From Indo-Arabic Numerals, Proceedings of 17th International Natural Language Generation Conference (INLG'24), 24-27 Sept 2024, Tokyo, Japan, in print, ACL.

Full text not available from this repository. (Use alternate locations listed below)

Abstract

Artificial conversational agents are deployed to assist humans in a variety of tasks. Some of these tasks require the capability to communicate numbers as part of their internal and abstract representations of meaning, such as for banking and scheduling appointments. They currently cannot do so for isiZulu because there are no algorithms to do so due to a lack of speech and text data and the transformation is complex and it may include dependence on the type of noun that is counted. We solved this by extracting and iteratively improving on the rules for speaking and writing numerals as words and creating two algorithms to automate the transformation. Evaluation of the algorithms by two isiZulu grammarians showed that six out of seven number categories were 90-100\% correct. The same software was used with an additional set of rules to create a large monolingual text corpus, made up of 771 643 sentences, to enable future data-driven approaches.

Item Type: Conference paper
Subjects: Computing methodologies > Artificial intelligence > Natural language processing > Natural language generation
Date Deposited: 10 Aug 2024 13:33
Last Modified: 10 Aug 2024 13:33
URI: https://pubs.cs.uct.ac.za/id/eprint/1686

Actions (login required)

View Item View Item