Please use this identifier to cite or link to this item: https://repository.seku.ac.ke/handle/123456789/6043
Full metadata record
DC FieldValueLanguage
dc.contributor.authorOdhiambo, Fredrick O.-
dc.date.accessioned2020-05-21T10:56:36Z-
dc.date.available2020-05-21T10:56:36Z-
dc.date.issued2020-04-
dc.identifier.citationMathematical Modelling and Applications; 5(2): 87-93en_US
dc.identifier.issn2575-1786-
dc.identifier.issn2575-1794-
dc.identifier.urihttp://article.sciencepublishinggroup.com/pdf/10.11648.j.mma.20200502.14.pdf-
dc.identifier.urihttp://repository.seku.ac.ke/handle/123456789/6043-
dc.descriptionDOI: 10.11648/j.mma.20200502.14en_US
dc.description.abstractScientific literature lack straight forward answer as to the most suitable method for missing data imputation in terms of simplicity, accuracy and ease of use among the existing methods. Exploration various methods of data imputation is done, and then a robust method of data imputation is proposed. The paper uses simulated data sets generated for various distributions. A regression function on the simulated data sets is used and obtained the residual standard errors for the function obtained. Data are randomly from the set of independent variables to create artificial data-non response and use suitable methods to impute the missing data. The method of Mean, regression, hot and cold decking, multiple, median imputation, list wise deletion, EM algorithm and the nearest neighbour method are considered. This paper investigates the three most common traditional methods of handling missing data to establish the most optimal method. The suitability is hence determined by the method whose imputed data sample characteristic does not vary considerably from the original data set before imputation. The variation is here determined using the regression intercept and the residual standard error. R statistical package has been used widely in most of the regression cases. Microsoft excel is used to determine the correlation of columns in hot decking method; this is because it is readily available as a component of Microsoft package. The results from data analysis section indicated an intercept and R-squared values that closely mirror those of original data sets, suggesting that median imputation is a better data imputation method among the conventional methods. This finding is important from the research point of view, given the many cases of data missingness in scientific research. Finding and using the median is simple and as such most researchers have a ready tool at hand for handling missing data.en_US
dc.language.isoenen_US
dc.subjectRegressionen_US
dc.subjectNearest Neighboren_US
dc.subjectHot Deckingen_US
dc.subjectMedian Substitutionen_US
dc.subjectMissing Dataen_US
dc.titleComparative study of various methods of handling missing dataen_US
dc.typeArticleen_US
Appears in Collections:School of Science and Computing (JA)

Files in This Item:
File Description SizeFormat 
Odhiambo_Comparative study of various methods of handling missing data .pdfAbstract83.73 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.