A Sesotho news headlines dataset for sentiment analysis

Show simple item record

dc.contributor.author Mokhosi, Refuoe
dc.contributor.author Shikali, Casper S.
dc.contributor.author Sethobane, Matello
dc.date.accessioned 2024-04-05T07:44:33Z
dc.date.available 2024-04-05T07:44:33Z
dc.date.issued 2024-03-27
dc.identifier.citation Data in Brief, 54, 110371 27 March 2024 en_US
dc.identifier.uri https://www.sciencedirect.com/science/article/pii/S2352340924003408
dc.identifier.uri http://repository.seku.ac.ke/xmlui/handle/123456789/7537
dc.description https://doi.org/10.1016/j.dib.2024.110371 en_US
dc.description.abstract Sentiment Analysis (SA) is a subset of Natural Language Processing (NLP) which has become a promising research area enabling the provision of language specific services. Although research in high resource languages such as English and Chinese has achieved promising results, research in low resource African languages such as Sesotho is still in its infancy due to limited text and speech datasets. This study contributes in this regard by availing the Sesotho News (SN) dataset, as an annotated dataset for the SA and Aspect Based Sentiment Analysis (ABSA) tasks. This dataset may be used for NLP research to benefit 1.85 million Sesotho speakers in Lesotho and 11.5 million speakers in South Africa. The dataset includes 4651 headlines for the ABSA task and 2401 headlines for the SA task using Lesotho's orthography of Sesotho. The news headlines were collected from Sesotho online newspapers and then annotated for the ABSA and SA tasks. The Spearman's correlation and Cohen's Kappa Index metrics show that there is good correlation between the annotators, implying that the SN dataset is of gold standard. en_US
dc.language.iso en en_US
dc.publisher Elsevier en_US
dc.subject Sesotho dataset en_US
dc.subject News headlines en_US
dc.subject Sentiment analysis en_US
dc.subject Aspect based sentiment analysis en_US
dc.subject Natural language processing en_US
dc.subject Machine learning en_US
dc.title A Sesotho news headlines dataset for sentiment analysis en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Dspace


Browse

My Account