# math to compute width of gaussian PDF(x) given sigma & N values of random variable x

Hi Experts,

I'm working in industry and have an application requiring some expert knowledge on statistics/probability. I have a probability distribution function (PDF) for a Gaussian random variable. I know the standard deviation of the PDF. I also know total number of experiments conducted, where one experiment is one value of the random variable, x.

For example, the standard deviation in my application is 1 ps RMS (e.g. 1 ps = 1E-12 seconds). The number of measured values for my random variable is 600E+9 (e.g. 600E+9 individual values of x; I don't have the individual values, but I know 600E+9 of them were measured).

From this information, I need to predict the largest peak-to-peak deviation (e.g. the width of the PDF, from the end of one tail to the end of the other tail) that may be observed (as a function of a given confidence level, or confidence interval, not sure what's the right terminology here; I believe this level/interval is needed to define the goal, correct me if not).

Can anyone help me understand the equations involved? I know Gaussian PDF for random variable x is

PDF(x) = (1/sigma*sqrt(2*pi))*e^(x*x/(2*sigma^2))

Not sure how to quantify the largest peak-peak deviation expected based on number o acquired samples N and sigma. Thanks in advance. -Tim

I'm not sure that I understand you, so lets see if I get this right.

- You are assuming a normal distribution, of which you know the SD.
- You have 600 x 10^9 observations
- You want to determine the mean.
- You want to determine a confidence interval for this estimate of the mean

Am I right?

The mean is known to be zero. The standard deviation is also known, and referred to using the "sigma" variable above. I can provide the confidence level if needed to solve this problem (e.g. we can assume 95%), which I think should be required (correct me if not).

I don't actually have the N=600x10^9 individual observations, but I want to know that if I DO observe N separate measurements of the random variable, x, how far out into the tail of the Gaussian PDF can I observe (e.g. with 95% confidence level)? For example, given N, sigma, CL=95% (mean=0) for normal distribution, I can at most observe Q sigma into the left-tail of the PDF and (since it's symmetric distribution) Q sigma into the right-tail of the PDF, and therefore, I can observe 2*Q peak-peak deviation max -- my question is how to solve for Q? Hope that helps.

So one way to think of this is you want to find a prediction interval for the observed maximum value for that sample?

Or you want to compute the expected sample range from 600E+9 samples?

(sample maximum - sample minimum)

Hi Dason,

Sounds right, although honestly I'm not experienced enough to match my application with the precise meaning of "prediction interval". If we conduct N measurements of random variable x having a zero mean, and compute a histogram of the results, I want to compute the width of the histogram. I don't actually have the N measurements (otherwise I'd simply plot the histogram and measure it), but let's say I want to predict the histogram's width if I WERE to conduct N measurements. Is there any math to compute the histogram's width, which I assume requires one to provide given some confidence level or probability?

Hi BGM, Yes, I want to predict (sample maximum - sample minimum), where sample maximum is the largest POSITIVE value of x measured, and sample minimum is the largest NEGATIVE value of x measured (e.g. the mean is zero). I assume a symmetrical distribution, so the absolute value of both of these values should be equal (and I'll multiply this value by 2 to get my final peak-to-peak result).

Why are you using the normal distribution?

The nature of the random variable is random (e.g. thermal noise).

The nature of the random variable is random (e.g. thermal noise).
I dont know anything about thermal noise but why does that make it normal?

Wikipedia says:
In probability theory, the normal (or Gaussian) distribution is a continuous probability distribution that is often used as a first approximation to describe real-valued random variables that tend to cluster around a single mean value.

This to me implies that data distributed this way is not random.

There are several tests for normality that you can use to prove it.

This to me implies that data distributed this way is not random.
What? Are you saying that data that is distributed like a normal distribution isn't random?

There are several tests for normality that you can use to prove it.[/QUOTE]

Typically you can't 'prove' normality. You can provide evidence against the idea the the distribution is normal but there isn't a very good way to provide evidence that the distribution is normal. Given the sample size even if the data is almost exactly normal, if it isn't perfectly normally distributed then it will fail basically any test of normality.

By the way - a lot of times 'noise' does tend to follow an approximate normal distribution.

Even if the noise is perfectly not Gaussian (or Normally distributed), by the central limit theorem, as the number of independent noise sources increase, the resulting distribution converges to Gaussian, and more so as the number of samples increases.

Since you have a very large sample size, you may use some asymptotic theory to get
an approximation.

You may go to search for the book written by H.A. David. Also you may search for the
extreme value theory. Sorry I have very limited knowledge in order statistics, which
cannot help much.

Normal distribution implies some kind of weighting around a central point. Can that occur and still be "random"?

My understanding of clt is that the underlying process that produces the noise cannot be non-stationary.

Normal distribution implies some kind of weighting around a central point. Can that occur and still be "random"?
I think you're confused as to what "random" means. You might be thinking that "random" implies some sort of uniform distribution. That is not the case.

Why do you think that?

Because you don't think that data can be created by a random process and still end up having a weighting around a central point? It's very possible that I'm misunderstanding you but why don't you explain to me why you think that.

By saying noise is normally distributed you are implying a distribution of some characteristic of the noise.

