Imputing Missing Values using Regression


New Member
I have a table of 4 variables where some of the data is missing and I need to impute the missing values, my table is as follows:

X1 X2 X3 X4
1.0 20 3.5 4
1.1 18 4.0 2
1.9 22 2.2 -
0.9 15 - -

My notes for this material a quite limited and not very intuitive, could someone explain to me how I can solve for the 4th value in X3? My notes make reference to Use linear regression, sweep left-to-right X3=a+b*X2+c*X1 and X4=d+e*X3+f*X2+g*X1 but I don't see where a,b,c,d,e,f and g are.

From my limited knowledge of regression there is generally a dependent variable y which we can predict B0 and B1 but this isn't the case here.

Can anyone point me in the right direction?


Less is more. Stay pure. Stay poor.
How much missing data do you have? is the above example it? And those are observations, correct. Do you have any reason to think your data is missing at random or there is a systematic cause?

I am guessing those a-g are just place holders for the betas, like you mention.


New Member
Hi, thanks for the reply. I've no idea if it's MCAR, NMAR to MAR it's an example from lecture material and the notes are not great.

That is all the observations that were provided.

I don't need to actually impute the missing values, more so understanding the process of how you would use regression to impute the data.


No cake for spunky
I am not sure how the variables are measured. There are many possible algorithms for doing missing data, how you do it (and the results) would depend on what the data you are using comprises (interval, categorical etc).