Tricky running mean and standard deviation problem

#1
Hi,

I have a pretty tricky problem that I can't seem to solve. The basic problem can be expressed as follows:

- I have a stream of data (in my example code below its expressed as a loop where the data is just the values 1 -> 20.
- I want to calculate the mean and standard deviation at each iteration of the loop, but only for the last five values.
- Normally for a running mean and std dev I would do something like:

Code:
for ( i = 0 ; i < 20 ; ++i )
{
    data  = (i+1) ;
    temp  = mean ;
    mean += (data - mean) / (i+1) ;
    svar += (data - temp) * (data - mean) ;
    stdd  = sqrt(svar/(i+1)) ;
}
but because I only want to find the stats for the last five values I have to do something like this

Code:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>


int main()
{
    /* variables to keep statistics data */
    
    double * data ;
    double mean ;
    double stdd ;
    double sum ;
    
    /* number of frames needed to get a mean and standard deviation */
    
    int frames_needed ;
    
    /* elements to index data array */
    
    int new_index ;
    int old_index ;
    int i, j ;
    
    /* initialise variables */
    
    frames_needed = 5 ;
    data = (double *) malloc((frames_needed) * sizeof(double)) ;
    sum = new_index = old_index = 0 ;
    old_index = -1 ;
    
    for ( i = 0 ; i < 20 ; ++i )
    {
        /* get data */
        data[new_index] = (i+1) ;
        
        /* sum data to get mean */
        sum += data[new_index] ;
        
        /* if enough data has been taken to get a mean */
        if ( i+1 >= frames_needed )
        {
            /* calculate the mean */
            mean = sum / frames_needed ;
            
            /* calculate standard deviation */
           
            stdd = 0 ;
            
            for ( j = 0 ; j < frames_needed ; ++j )
            {
                stdd += (data[j] - mean)*(data[j] - mean) ;
            }
            
            stdd = sqrt(stdd/frames_needed) ;
            
            printf("mean = %f, standard deviation = %f\n", mean, stdd) ;
            
            /* increment the oldest element index for the next iteration */
            old_index = (old_index == frames_needed-1) ? 0 : old_index + 1 ;
            
            /* remove old data from sum */
            sum -= data[old_index] ;
        }
    
        /* increment the newest element index for next iteration */
        new_index = (new_index == frames_needed-1) ? 0 : new_index + 1 ;
    }

    return 0 ;
}
The annoying thing is the nested loop calculating the standard deviation, it may not seem too bad, but the actual data I need to process are images and every extra loop takes up valuable processing time.

Does anyone have any idea how I might be able to calculate the standard deviation without having a nested for loop, in a way perhapes similar to the example at the top.

Thanks for your time,

James

PS. I thought I would post the whole code in case anyone wanted to have a go, obviously this is trivial as the standard deviation is sqrt(2) every time.
 

Dragan

Super Moderator
#2
Does anyone have any idea how I might be able to calculate the standard deviation without having a nested for loop.....Thanks for your time,

James


---------------------------------------------------------------

Here's an idea. For the example you give of 20 values, calculate the mean and variance (or standard deviation) for the first 15 values. Let's denote these statistics as: Xbar_15, Var[X]_15, and let n=15 denote the sample size.

Next, the mean and variance that includes the 16th value of x denoted as x_16 can be computed as follows:

Xbar_16 = (1/(n + 1))*x_16 + (n/(n + 1))*Xbar_15

Var[X_16] =[ (n^2 -1)*Var[X_15] + n*(x_16 - Xbar_15)^2 ] / (n*(n+1)).



Now, continue on....i.e.

let m = n+1, so we have:

Xbar_17 = (1/(m + 1))*x_17 + (m/(m + 1))*Xbar_16

Var[X_17] =[ (m^2 -1)*Var[X_16] + m*(x_17 - Xbar_16)^2 ] / (m*(m+1)).

Get the idea (more generally)?...I think this might solve your problem. Obviously, just take the Sqrt[Var] to get the std. deviation.
 
Last edited:
#3
hi, thank you for your reply. I'm not sure it solves my problem though, I'd like to be able to calculate the standard deviation as I go but at each point I need to some how remove the oldest data's effect on the standard deviation, I've done this for the mean by subtracting the oldest bit of data from the running sum each time.

for example

- I have a series of data (1,2,3,4,5,6)
- after the first 5 bits of data I want the mean and std dev of 1,2,3,4 and 5
- on the next iteration I want the mean and std dev of 2,3,4,5 and 6

so I need to remove the effect that the first bit of data (1) has on my running standard deviation.

Thanks again,

James
 

vinux

Dark Knight
#4
Try straight forward apporach...

Code:
for ( i = 0 ; i < 16 ; ++i )
    {
        /* I assume you have the data 
        data[new_index] = (i+1) ;
                                                */        

        /*  mean */
        mean[i]= (data[i]+data[i+1]+data[i+2]+data[i+3]+data[i+4])/5. ;
     /*    */
       var[i] =   ( (data[i] - mean[i])**2+....   )   ;
            
     sd[i] = sqrt(var[i]);
    }
 

Dragan

Super Moderator
#7
hi, thank you for your reply. I'm not sure it solves my problem though, I'd like to be able to calculate the standard deviation as I go but at each point I need to some how remove the oldest data's effect on the standard deviation, I've done this for the mean by subtracting the oldest bit of data from the running sum each time.

for example

- I have a series of data (1,2,3,4,5,6)
- after the first 5 bits of data I want the mean and std dev of 1,2,3,4 and 5
- on the next iteration I want the mean and std dev of 2,3,4,5 and 6

so I need to remove the effect that the first bit of data (1) has on my running standard deviation.

Thanks again,

James

Well, if you've got the mean of the next iteration, let's give this a try.....

Let: A={x1,x2,x3,x4,x5} and B={x2,x3,x4,x5,x6} where sets A and B consist of n elements. The variance of B, Var (i.e. the next iteration) can be expressed in terms of the Var[A] and the means of A and B (Abar and Bbar) as follows:

Var = [ (n-1)*Var[A] + n*Abar^2 + (x6^2 - x1^2) - n*Bbar^2 ] / (n-1)

See if this helps.
 
Last edited:

Dragan

Super Moderator
#9
- I have a series of data (1,2,3,4,5,6)
- after the first 5 bits of data I want the mean and std dev of 1,2,3,4 and 5
- on the next iteration I want the mean and std dev of 2,3,4,5 and 6

so I need to remove the effect that the first bit of data (1) has on my running standard deviation.

Thanks again,

James

Here is some (Fortran) source code that will implement the method (expression)
I described in my last post.

As you can see the algorithm does not require any nested loops.

_________________________________________________________

Set the sample size (N) and the point where you want to start running the mean and variance or standard deviation (M).

M=N-J

(* COMPUTE THE MEAN AND VARIANCE ON THE VALUES FROM 1 TO M *)

MEAN1=0.0
SUM=0.0
DO 10 I=1,M
MEAN1=MEAN1+X(I)/M
SUM=SUM+X(I)**2
10 CONTINUE

VAR1=(SUM - M*MEAN1**2) / (M - 1.0)

(* BEGIN RUNNING MEANS AND VARIANCE COMPUTATIONS *)

DO 20 I=1,J

MEAN2=MEAN1+(X(M+I)-X(I)) / M

VAR2=((M-1)*VAR1 + M*MEAN1**2 + (X(M+I)**2 - X(I)**2) - M*MEAN2**2)/(M-1)

MEAN1=MEAN2
VAR1=VAR2

20 CONTINUE

END
__________________________________________________


I checked the code and it works just fine.
 
Last edited: