Distribution of Extreme Spread for (n, sigma)

dbooksta

New Member
Given a Rayleigh process R(s) generating samples $$X_i$$ -- which is equivalent to a bivariate normal process with 0 correlation and both sigmas = s -- what is the distribution of the Extreme Spread of n samples?

Extreme spread $$ES\{X\}_n \equiv max_{i, j}|X_i - X_j|$$

E[$$ES_n(\sigma)$$], and Pr($$ES_n(\sigma)$$ > a), seems like it should have a chi(n) distribution, but I am not good enough to derive the exact relationship.

I did find this paper from 1975 which on p.8 suggests as much, but which focuses on empirical solution instead of on the pure math. In contrast, I want to take the parameters of the distribution of X as given and find a formulaic distribution for ES.

Any guidance appreciated!

BGM

TS Contributor
Seems like you want to compute the CDF/pdf of the sample range (assume the underlying sample $$X$$ is a continuous random variable) - the difference between the sample maximum and sample minimum, $$X_{(n)} - X_{(1)}$$

http://en.wikipedia.org/wiki/Range_(statistics)

By elementary multinomial-type arguments, the joint pdf of $$(X_{(1)}, X_{(n)})$$ is

$$f_{X_{(1)}, X_{(n)}}(x_1, x_n)$$

$$= \frac {n!} {0!1!(n-2)!1!0!} F_X(x_1)^0f_X(x_1)^1[F_X(x_n) - F_X(x_1)]^{n-2}f_X(x_n)^1[1 - F_X(x_n)]^0$$

$$= n(n-1)f_X(x_1)f_X(x_n)[F_X(x_n) - F_X(x_1)]^{n-2}$$

where $$f_X, F_X$$ are the pdf and CDF of the orginal unordered random sample.

Using elementary transformation tricks, we know that the pdf of sample range

$$f_R(r) = \int_{-\infty}^{+\infty} n(n-1)f_X(x)f_X(x+r)[F_X(x+r) - F_X(x)]^{n-2}dx, r > 0$$

By integration, the CDF is

$$F_R(r) = \int_{-\infty}^{+\infty} nf_X(x)[F_X(x+r) - F_X(x)]^{n-1}dx, r > 0$$

as listed in the above wiki link.

Therefore one just need to apply the above integral to find the answer. However many of them are quite ugly and have no closed-form solution. For your case, lets try to calculate the CDF:

$$n\int_0^{+\infty} \frac {x} {\sigma^2} \exp\left\{-\frac {x^2} {2\sigma^2}\right\} \left[\exp\left\{-\frac {x^2} {2\sigma^2}\right\} - \exp\left\{-\frac {(x+r)^2} {2\sigma^2}\right\}\right]^{n-1} dx$$

$$= \frac {n} {\sigma^2} \int_0^{+\infty} x \exp\left\{-\frac {nx^2} {2\sigma^2}\right\} \left[1 - \exp\left\{-\frac {2rx + r^2} {2\sigma^2}\right\}\right]^{n-1}dx$$

$$= \frac {n} {\sigma^2} \int_0^{+\infty} x \exp\left\{-\frac {nx^2} {2\sigma^2}\right\} \sum_{k=0}^{n-1} \binom {n-1} {k} (-1)^k \exp\left\{-\frac {2krx + kr^2} {2\sigma^2}\right\} dx$$

$$= \frac {n} {\sigma^2} \sum_{k=0}^{n-1} \binom {n-1} {k} (-1)^k \int_0^{+\infty} x \exp\left\{-\frac {nx^2 + 2krx + kr^2} {2\sigma^2}\right\} dx$$

$$= \frac {n} {\sigma^2} \sum_{k=0}^{n-1} \binom {n-1} {k} (-1)^k$$ $$\int_0^{+\infty} x \exp\left\{- \frac {n} {2\sigma^2} \left(x^2 + 2 \frac {kr} {n} x + \frac {k^2r^2} {n^2}\right) + \frac {1} {2\sigma^2} \left(\frac {k^2r^2} {n} - kr^2 \right)\right\} dx$$

$$= \frac {n} {\sigma^2} \sum_{k=0}^{n-1} \binom {n-1} {k} (-1)^k$$ $$\int_0^{+\infty} x \exp\left\{- \frac {n} {2\sigma^2} \left(x^2 + 2 \frac {kr} {n} x + \frac {k^2r^2} {n^2}\right) - \frac {1} {2\sigma^2} \left(kr^2 - \frac {k^2r^2} {n}\right)\right\} dx$$

$$= \frac {n} {\sigma^2} \sum_{k=0}^{n-1} \binom {n-1} {k} (-1)^k \exp\left\{-\frac {kr^2} {2\sigma^2} \left(1 - \frac {k} {n}\right) \right\}$$ $$\int_0^{+\infty} x \exp\left\{- \frac {n} {2\sigma^2} \left(x + \frac {kr} {n}\right)^2 \right\}dx$$

Finally we consider:

$$\int_0^{+\infty} \left(x + \frac {kr} {n} \right) \exp\left\{- \frac {n} {2\sigma^2} \left(x + \frac {kr} {n}\right)^2 \right\}dx = \frac {\sigma^2} {n}$$

and

$$\int_0^{+\infty} \exp\left\{- \frac {n} {2\sigma^2} \left(x + \frac {kr} {n}\right)^2 \right\}dx$$

$$= \sqrt{2\pi\frac {\sigma^2} {n}} \int_0^{+\infty} \frac {1} {\sqrt{2\pi\frac {\sigma^2} {n}}} \exp\left\{- \frac {n} {2\sigma^2} \left(x + \frac {kr} {n}\right)^2 \right\}dx$$

$$= \sqrt{2\pi\frac {\sigma^2} {n}} \Pr\left\{\frac {\sigma} {\sqrt{n}} Z - \frac {kr} {n} > 0 \right\}$$

$$= \sqrt{2\pi\frac {\sigma^2} {n}}\left[1 - \Phi\left(\frac {kr} {\sigma\sqrt{n}}\right)\right]$$

where $$Z$$ is the standard normal random variable and $$\Phi$$ is its CDF.

As a result, the CDF is

$$= \frac {n} {\sigma^2} \sum_{k=0}^{n-1} \binom {n-1} {k} (-1)^k \exp\left\{-\frac {kr^2} {2\sigma^2} \left(1 - \frac {k} {n}\right) \right\}$$ $$\int_0^{+\infty} \left(x + \frac {kr} {n} - \frac {kr} {n} \right) \exp\left\{- \frac {n} {2\sigma^2} \left(x + \frac {kr} {n}\right)^2 \right\}dx$$

$$= \frac {n} {\sigma^2} \sum_{k=0}^{n-1} \binom {n-1} {k} (-1)^k \exp\left\{-\frac {kr^2} {2\sigma^2} \left(1 - \frac {k} {n}\right) \right\}$$ $$\left\{\frac {\sigma^2} {n} - \frac {kr} {n} \sqrt{2\pi\frac {\sigma^2} {n}}\left[1 - \Phi\left(\frac {kr} {\sigma\sqrt{n}}\right)\right] \right\}$$

But as you see it is quite a mess.

dbooksta

New Member
Wow. Just curious: are there any continuous distributions that have clean range distributions?

As for this case: Can you see any way to pull sigma outside of the outer sum?

I ask because in practice I will only be concerned with N in the range [2, 100], but for arbitrary sigma. If we can establish a relationship between the CDF(given n) and sigma, then I could compute the values for each N and sigma = 1 just once and then use those reference values to compute the probability given a different sigma.

Or is it known that there's no better way to get values than to do the full sum you derived?

dbooksta

New Member
Another way of putting my follow-up question: Intuitively we expect some scale invariance of the extreme spread w.r.t. sigma. I.e., holding n constant, if $$\mathrm{E}[ES_n(\sigma)] = X$$ then I would expect something like $$\mathrm{E}[ES_n(a \sigma)] = aX$$ or = $$\sqrt{a} X$$

BGM

TS Contributor
Yes you have raised a very good point which I missed - $$\sigma$$ is the scale parameter.

We have the following:

If $$U \sim \text{Rayleigh}(1)$$, then $$X = \sigma U \sim \text{Rayleigh}(\sigma)$$

Since the scaling does not affect the order (remember $$\sigma > 0$$), we have

$$X_{(n)} - X_{(1)} = \sigma U_{(n)} - \sigma U_{(1)} = \sigma (U_{(n)} - U_{(1)})$$

i.e. again $$\sigma$$ is the scale parameter of the sample range.

Note that $$U_{(n)} - U_{(1)}$$ is independent of $$\sigma$$;

and we have the relationship

$$E[X_{(n)} - X_{(1)}] = \sigma E[U_{(n)} - U_{(1)}]$$

$$F_{X_{(n)} - X_{(1)}}(r)$$

$$= \Pr\{X_{(n)} - X_{(1)} \leq r\}$$

$$= \Pr\left\{U_{(n)} - U_{(1)} \leq \frac {r} {\sigma} \right\}$$

$$= F_{U_{(n)} - U_{(1)}}\left(\frac {r} {\sigma} \right)$$

and you can see this relationship in the answer of my previous post:

$$\sum_{k=0}^{n-1} \binom {n-1} {k} (-1)^k \exp\left\{-\frac {k} {2} \left(\frac {r} {\sigma}\right)^2 \left(1 - \frac {k} {n}\right) \right\}$$ $$\left\{1 - k \left( \frac {r} {\sigma} \right)\sqrt{\frac {2\pi} {n}}\left[1 - \Phi\left(\frac {k} {\sqrt{n}} \times \frac {r} {\sigma}\right)\right] \right\}$$

- it can be written as a function of $$\frac {r} {\sigma}$$

dbooksta

New Member
Ah ha, I guess this makes sense. So there is scale invariance, but the probability I'm looking for is a function of both n and how far from the sigma I'm looking. I.e., a lookup table would have to provide precomputed values for two dimensions: $$(n, \frac{r}{\sigma})$$.

So to summarize
• There's a separate CDF for each n
• For a Rayleigh process the CDF of the extreme spread is the sum you derived:
$$\sum_{k=0}^{n-1} \binom {n-1} {k} (-1)^k \exp\left\{-\frac {k} {2} \left(\frac {r} {\sigma}\right)^2 \left(1 - \frac {k} {n}\right) \right\}$$ $$\left\{1 - k \left( \frac {r} {\sigma} \right)\sqrt{\frac {2\pi} {n}}\left[1 - \Phi\left(\frac {k} {\sqrt{n}} \times \frac {r} {\sigma}\right)\right] \right\}$$
• When looking for extreme spread probabilities from a Rayleigh process with n samples, once we compute a particular point on the CDF(n), the same value holds for all probabilities at the same scale distance $$(\frac{r}{\sigma})$$.

Is this correct?

BGM

TS Contributor
I think you got it. Hopefully there is no careless mistakes in the above calculations I did not check for many times.