Enhancing African low-resource languages: Swahili data for language modelling

Show simple item record

dc.contributor.author Shikali, Casper S.
dc.contributor.author Mokhosi, Refuoe
dc.date.accessioned 2024-03-25T12:57:01Z
dc.date.available 2024-03-25T12:57:01Z
dc.date.issued 2020-08
dc.identifier.citation Data in Brief, Volume 31, 105951, August 2020 en_US
dc.identifier.issn 2352-3409
dc.identifier.uri https://www.sciencedirect.com/science/article/pii/S2352340920308453
dc.identifier.uri http://repository.seku.ac.ke/xmlui/handle/123456789/7530
dc.description https://doi.org/10.1016/j.dib.2020.105951 en_US
dc.description.abstract Language modelling using neural networks requires adequate data to guarantee quality word representation which is important for natural language processing (NLP) tasks. However, African languages, Swahili in particular, have been disadvantaged and most of them are classified as low resource languages because of inadequate data for NLP. In this article, we derive and contribute unannotated Swahili dataset, Swahili syllabic alphabet and Swahili word analogy dataset to address the need for language processing resources especially for low resource languages. Therefore, we derive the unannotated Swahili dataset by pre-processing raw Swahili data using a Python script, formulate the syllabic alphabet and develop the Swahili word analogy dataset based on an existing English dataset. We envisage that the datasets will not only support language models but also other NLP downstream tasks such as part-of-speech tagging, machine translation and sentiment analysis en_US
dc.language.iso en en_US
dc.publisher Elsevier en_US
dc.subject Natural language processing en_US
dc.subject Deep learning en_US
dc.subject Language modelling en_US
dc.subject Unannotated data en_US
dc.subject Word analogy en_US
dc.subject Syllables en_US
dc.subject Neural networks en_US
dc.title Enhancing African low-resource languages: Swahili data for language modelling en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Dspace


Browse

My Account