"Missing Value" and "Values": What is the connection?



Then what are my missing values? Are they -1, 998, 999 or none of these? What is the connection between the "missing value" and "Values"?

Would appreciate your help very much!!!


TS Contributor
A good dataset comes with a codebook, which tells you how the data was collected, the exact questions that were asked and the exact answer possibilities that were given to the respondents, how these questions correspond to the variables in your data, what the codes mean, etc. We don't have the codebook of your data, we don't even know what your dataset is. So the first thing you need to do is go to the person who gave you the dataset and ask her/him for the codebook.

There are some conventions, that can help us make a guess. How missing values are coded depends on the software that is being used. So if you prepare a dataset of general use, you cannot use codes that are automatically recognized as missing by all programs. Instead, we typically give them normal numbers and document, in the codebook, that these are actually missing. So if you want to use that dataset correctly you need to transform these numbers to something your software package recognizes as missing. The value labels often use similar abbreviations "DK" can be "Don't Know" and thus missing. But be warned, "DK" is also used as an abbreviation for "Denmark". Often you can guess from the context which one applies, but a good researcher does not guess, (s)he looks it up in the codebook. (Did I mention that the codebook is important?). "NA" is often used for "Not Available", i.e. generic missing, but there is no guarantee. I don't know "IAP", so you need to look this up in .... (you can guess what I am going to say now...)