Maximising the Value of Missing Data

Atai Winkler

Authors

Atai Winkler

Keywords:

missing data; imputation; gaps; holes; data mining; empty data

Abstract

The subject of missing values in databases and how to handle them has received very little attention in the statistics and data mining literature1 2 3 and even less if any at all in the marketing literature The usual attitude of practitioners is we ll just have to ignore records with missing values On the other hand a few very advanced theoretical solutions have been developed some of which have been applied particularly to clinical trials data These solutions can only be applied to small databases not to the very large databases held by many companies on their customers This paper describes a new method for imputing missing values in such very large databases Two particular features of the method are that it can handle all combinations of variable type continuous ordinal and categorical and that all the missing values in the database are imputed in one run of the software It is based on the k-nearest neighbours method a well known method in data mining The paper concludes by presenting the results of a study of this method when used to impute the missing values in a real set of data This paper is only concerned with missing data i e data that are not known but which have real values It does not address the problem of empty data i e data that are not known but which cannot have real values

Downloads

How to Cite

Maximising the Value of Missing Data. (2014). Global Journal of Computer Science and Technology, 14(C3), 41-48. https://computerresearch.org/index.php/computer/article/view/101

Download Citation

Maximising the Value of Missing Data

References

How to Cite