Edit: can't sort out when mathjax displays or not...or inline or not ... help...

Edit2: sorted...

Edit\(\infty\): sorted sorted...

Edit\(\infty^2\): Added TLDR

TLDR: Can anyone explain, intuitively or otherwise, the following bias correction for a weighted sample variance?

$$s^2 = \sigma^2 \frac {W^2} {W^2 - N}$$ where \(\sigma^2\) is the biased sample variance, \(s^2\) is the unbiased sample variance, \(W\) is the sum of the (not normalized) weights, and \(N\) is the number of observations?

---------------------------------------------------------------------------------------------------

I am trying to understand the variance calculation of remotely sensed data in the source code of a processing application. The software averages data spatially and temporally. For each satellite pass the data is spatially binned with a simple average. The binned data is then averaged temporally using a weight related to the number of observations per spatial bin per pass. I will describe the calculation for a single spatial bin and multiple passes.

The number of observations per spatial bin per pass: is \(n_i\) where \(i=1\) to \(N\) and \(N\) is the number of passes (\(i\) is the temporal index).

The weight \(w_{ij}\) is ad hoc:

$$w_{ij} = \frac{1}{\sqrt{n_i}}$$

where \(j=1\) to \(n_i\) (index of data points per spatial bin per pass).

[If each pass was given identical weight, regardless of the number of observations per bin per pass, that would be equivalent to weighting each bin per pass by \(\frac{1}{n_i}\). The software developer used \(\frac{1}{\sqrt {n_i}}\) as a compromise so that bins with more observations are not "under-weighted.]

Then the sum of all weights \(W\) is:

$$W = \sum_{i=1}^N \sum_{j=1}^{n_i}\frac{1}{\sqrt {n_i}} = \sum_{i=1}^N\sqrt {n_i}$$

the mean \(\bar x\) is:

$$\bar x = \frac {1} {W} \sum_{i=1}^N \frac {1}{\sqrt {n_i}} \sum_{j=1}^{n_i}x_{ij}$$

and the biased variance \(\sigma^2\) is:

$$\sigma^2 = \left( \frac {1} {W} \sum_{i=1}^N \frac {1}{\sqrt {n_i}} \sum_{j=1}^{n_i}x_{ij}^2 \right)-{\bar x}^2$$

My confusion comes with what appears to be a bias correction.

The unbiased variance \(s^2\) is:

$$s^2 = \sigma^2 \frac {W^2} {W^2 - N}$$

Can anyone make any sense of this correction? The technical report for the methodology does not show the bias correction, it is only in the code. The code writers have so far been unable to provide a reason or justification for the bias correction. Since the code writers are NASA and I am a tiny fish in a big pond, I don't expect to get an answer from them. I have tried. The correction was probably proposed over 25 years ago by an in-house statistician who has since retired. I would be grateful for an intuitive explanation. I have a weak stats background. Thanks.

Edit2: sorted...

Edit\(\infty\): sorted sorted...

Edit\(\infty^2\): Added TLDR

TLDR: Can anyone explain, intuitively or otherwise, the following bias correction for a weighted sample variance?

$$s^2 = \sigma^2 \frac {W^2} {W^2 - N}$$ where \(\sigma^2\) is the biased sample variance, \(s^2\) is the unbiased sample variance, \(W\) is the sum of the (not normalized) weights, and \(N\) is the number of observations?

---------------------------------------------------------------------------------------------------

I am trying to understand the variance calculation of remotely sensed data in the source code of a processing application. The software averages data spatially and temporally. For each satellite pass the data is spatially binned with a simple average. The binned data is then averaged temporally using a weight related to the number of observations per spatial bin per pass. I will describe the calculation for a single spatial bin and multiple passes.

The number of observations per spatial bin per pass: is \(n_i\) where \(i=1\) to \(N\) and \(N\) is the number of passes (\(i\) is the temporal index).

The weight \(w_{ij}\) is ad hoc:

$$w_{ij} = \frac{1}{\sqrt{n_i}}$$

where \(j=1\) to \(n_i\) (index of data points per spatial bin per pass).

[If each pass was given identical weight, regardless of the number of observations per bin per pass, that would be equivalent to weighting each bin per pass by \(\frac{1}{n_i}\). The software developer used \(\frac{1}{\sqrt {n_i}}\) as a compromise so that bins with more observations are not "under-weighted.]

Then the sum of all weights \(W\) is:

$$W = \sum_{i=1}^N \sum_{j=1}^{n_i}\frac{1}{\sqrt {n_i}} = \sum_{i=1}^N\sqrt {n_i}$$

the mean \(\bar x\) is:

$$\bar x = \frac {1} {W} \sum_{i=1}^N \frac {1}{\sqrt {n_i}} \sum_{j=1}^{n_i}x_{ij}$$

and the biased variance \(\sigma^2\) is:

$$\sigma^2 = \left( \frac {1} {W} \sum_{i=1}^N \frac {1}{\sqrt {n_i}} \sum_{j=1}^{n_i}x_{ij}^2 \right)-{\bar x}^2$$

My confusion comes with what appears to be a bias correction.

The unbiased variance \(s^2\) is:

$$s^2 = \sigma^2 \frac {W^2} {W^2 - N}$$

Can anyone make any sense of this correction? The technical report for the methodology does not show the bias correction, it is only in the code. The code writers have so far been unable to provide a reason or justification for the bias correction. Since the code writers are NASA and I am a tiny fish in a big pond, I don't expect to get an answer from them. I have tried. The correction was probably proposed over 25 years ago by an in-house statistician who has since retired. I would be grateful for an intuitive explanation. I have a weak stats background. Thanks.

Last edited: