Please use this identifier to cite or link to this item: https://repository.seku.ac.ke/handle/123456789/6031
Full metadata record
DC FieldValueLanguage
dc.contributor.authorShikali, Casper S.-
dc.contributor.authorSijie, Zhou-
dc.contributor.authorQihe, Liu-
dc.contributor.authorMokhosi, Refuoe-
dc.date.accessioned2020-04-29T08:07:59Z-
dc.date.available2020-04-29T08:07:59Z-
dc.date.issued2019-09-
dc.identifier.citationApplied Science; 9(18), 3648.en_US
dc.identifier.issn2076-3417-
dc.identifier.issn2076-3417-
dc.identifier.urihttps://www.mdpi.com/2076-3417/9/18/3648/pdf-
dc.identifier.urihttp://repository.seku.ac.ke/handle/123456789/6031-
dc.descriptionDOI:10.3390/app9183648en_US
dc.description.abstractDeep learning has extensively been used in natural language processing with sub-word representation vectors playing a critical role. However, this cannot be said of Swahili, which is a low resource and widely spoken language in East and Central Africa. This study proposed novel word embeddings from syllable embeddings (WEFSE) for Swahili to address the concern of word representation for agglutinative and syllabic-based languages. Inspired by the learning methodology of Swahili in beginner classes, we encoded respective syllables instead of characters, character n-grams or morphemes of words and generated quality word embeddings using a convolutional neural network. The quality of WEFSE was demonstrated by the state-of-art results in the syllable-aware language model on both the small dataset (31.229 perplexity value) and the medium dataset (45.859 perplexity value), outperforming character-aware language models. We further evaluated the word embeddings using word analogy task. To the best of our knowledge, syllabic alphabets have not been used to compose the word representation vectors. Therefore, the main contributions of the study are a syllabic alphabet, WEFSE, a syllabic-aware language model and a word analogy dataset for Swahili.en_US
dc.language.isoenen_US
dc.publisherMDPIen_US
dc.subjectsyllabic alphabeten_US
dc.subjectword representation vectorsen_US
dc.subjectdeep learningen_US
dc.subjectsyllable-aware language modelen_US
dc.subjectperplexityen_US
dc.subjectword analogyen_US
dc.titleBetter word representation vectors using syllabic alphabet: a case study of Swahilien_US
dc.typeArticleen_US
Appears in Collections:School of Science and Computing (JA)

Files in This Item:
File Description SizeFormat 
Shikali_Better word representation vectors using syllabic alphabet.pdfFull Text1.71 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.