Please use this identifier to cite or link to this item: https://repository.seku.ac.ke/handle/123456789/7530
Full metadata record
DC FieldValueLanguage
dc.contributor.authorShikali, Casper S.-
dc.contributor.authorMokhosi, Refuoe-
dc.date.accessioned2024-03-25T12:57:01Z-
dc.date.available2024-03-25T12:57:01Z-
dc.date.issued2020-08-
dc.identifier.citationData in Brief, Volume 31, 105951, August 2020en_US
dc.identifier.issn2352-3409-
dc.identifier.urihttps://www.sciencedirect.com/science/article/pii/S2352340920308453-
dc.identifier.urihttp://repository.seku.ac.ke/xmlui/handle/123456789/7530-
dc.descriptionhttps://doi.org/10.1016/j.dib.2020.105951en_US
dc.description.abstractLanguage modelling using neural networks requires adequate data to guarantee quality word representation which is important for natural language processing (NLP) tasks. However, African languages, Swahili in particular, have been disadvantaged and most of them are classified as low resource languages because of inadequate data for NLP. In this article, we derive and contribute unannotated Swahili dataset, Swahili syllabic alphabet and Swahili word analogy dataset to address the need for language processing resources especially for low resource languages. Therefore, we derive the unannotated Swahili dataset by pre-processing raw Swahili data using a Python script, formulate the syllabic alphabet and develop the Swahili word analogy dataset based on an existing English dataset. We envisage that the datasets will not only support language models but also other NLP downstream tasks such as part-of-speech tagging, machine translation and sentiment analysisen_US
dc.language.isoenen_US
dc.publisherElsevieren_US
dc.subjectNatural language processingen_US
dc.subjectDeep learningen_US
dc.subjectLanguage modellingen_US
dc.subjectUnannotated dataen_US
dc.subjectWord analogyen_US
dc.subjectSyllablesen_US
dc.subjectNeural networksen_US
dc.titleEnhancing African low-resource languages: Swahili data for language modellingen_US
dc.typeArticleen_US
Appears in Collections:School of Science and Computing (JA)

Files in This Item:
File Description SizeFormat 
Shikali_Enhancing African low-resource languages....pdfAbstract3.59 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.