Better word representation vectors using syllabic alphabet: a case study of Swahili

Show simple item record

dc.contributor.author Shikali, Casper S.
dc.contributor.author Sijie, Zhou
dc.contributor.author Qihe, Liu
dc.contributor.author Mokhosi, Refuoe
dc.date.accessioned 2020-04-29T08:07:59Z
dc.date.available 2020-04-29T08:07:59Z
dc.date.issued 2019-09
dc.identifier.citation Applied Science; 9(18), 3648. en_US
dc.identifier.issn 2076-3417
dc.identifier.issn 2076-3417
dc.identifier.uri https://www.mdpi.com/2076-3417/9/18/3648/pdf
dc.identifier.uri http://repository.seku.ac.ke/handle/123456789/6031
dc.description DOI:10.3390/app9183648 en_US
dc.description.abstract Deep learning has extensively been used in natural language processing with sub-word representation vectors playing a critical role. However, this cannot be said of Swahili, which is a low resource and widely spoken language in East and Central Africa. This study proposed novel word embeddings from syllable embeddings (WEFSE) for Swahili to address the concern of word representation for agglutinative and syllabic-based languages. Inspired by the learning methodology of Swahili in beginner classes, we encoded respective syllables instead of characters, character n-grams or morphemes of words and generated quality word embeddings using a convolutional neural network. The quality of WEFSE was demonstrated by the state-of-art results in the syllable-aware language model on both the small dataset (31.229 perplexity value) and the medium dataset (45.859 perplexity value), outperforming character-aware language models. We further evaluated the word embeddings using word analogy task. To the best of our knowledge, syllabic alphabets have not been used to compose the word representation vectors. Therefore, the main contributions of the study are a syllabic alphabet, WEFSE, a syllabic-aware language model and a word analogy dataset for Swahili. en_US
dc.language.iso en en_US
dc.publisher MDPI en_US
dc.subject syllabic alphabet en_US
dc.subject word representation vectors en_US
dc.subject deep learning en_US
dc.subject syllable-aware language model en_US
dc.subject perplexity en_US
dc.subject word analogy en_US
dc.title Better word representation vectors using syllabic alphabet: a case study of Swahili en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Dspace


Browse

My Account