WLS Confidence interval

Guys can anyone assist?

I've googled and read lots on the matter of WLS (mainly lecture notes from asorted university courses google finds). However I've not come across any that go in depth into WLS and specifically address CI and PI production.

I’m looking for how to calculate a confidence interval for the mean value at a point \(x_0\) from a regression carried out using weighted least squares. – Preferably from a reliable source I can quote to people at work if I’m quizzed on the matter!
The situation I am examining is combining data from a number of sources. To help me understand the situation I am starting with the scenario of having one “good” source of data (low \(e_y\)) and another not so good data source (higher\( e_y\)) (but which it is desirable to include either due to a small amount of good data being available or because it reduces the amount of extrapolation from in a multivariable space).
I think I’ve got my head around WLS parameter estimation (I’m visualiing it to myself as being conducted in a “space” where the each error term has been standardised by dividing by σ_i).
However I now wish to make a prediction in “real” space for a vector of X.
Under OLS \( \hat{\beta }=\left(X^T X \right)^{-1}X^T Y \)
And the CI around the prediction for point \(x_0\) follows a t-distrubution multiplied by \( s\sqrt{(x^T_o (X^T X) x_o)}\)
and the PI \( s\sqrt{(x^T_o (X^T X) x_o +1)}\)
For WLS \( \hat{\beta }=\left(X^T W^{-1} X \right)^{-1}X^TW^{-1} Y \) where W is a matrix with diagonal elements which are the (expected/estimated/guessed) variance of the residuals
as it’s based on rescaling so that \(Y'_i=\frac{\alpha}{\sigma_i}+\frac {\beta x_i}{\sigma_i}+ \frac {\epsilon_i}{\sigma_i}\) this transformation makes the error term NIID(0,1). (Well it does if we are assuming it was normal independant and had variance of \( \sigma_i ^{2} \)

But… What happens to the CI and PI calculations?! \(\left(X^T W^{-1} X \right)^{-1}\) seems like a logical replacement for \(\left(X^T X \right)^{-1}\) (which. if the data is mean cantered. replaces a variance/covariance matrix with a correlation matrix), but… \(x_0\) is not standardised so \(x^T_o (X^T W^{-1} X) x_o\) would not be correct. I could divide x_0 by some value, maybe some sort of average?
(But then what happens if the Hetroscedasticity is a function of X for some sort of systematic issue or W contains covariance as well as variance terms?)
But… I can’t be the first person want to do this? So could anyone give me some pointers?!

I'm happy that once the CI has been estimated adding an appropriate amount of variance for the independant "error term" should handle the PI.

(Sorry for the long post!) ;)

Many Thanks
Last edited:
Additional note, I did check my logic by trying W= nI which confimed my CI was out by a factor of n as I expected. I'm stumped on how to handle it when W is more complex! ;)
Thanks for the reply.

What exactly do you mean by this?
Sorry was having notation trouble and running out of time at work! ;)

To check my logic I tried using different weighting matrices which were diagonal with values n to examine what impact that had on the CI calculations. As expected as it increased from 1 the CI calculations went wonky along with the estimate of \(e_y\) but the values of the coefficents and their standard errors remained the same. Once the value on the diagonals reached the OLS estimated \(e_y\) the WLS gave a value of \(e_y\) of 1 because I'd rescaled the points to effectively standardise the regression.
Last edited: