quantile-quantile plot (qqplot) in R by hand-ish

trinker

ggplot2orBust
#1
My understanding of a qqplot was it was the sorted values for a variable on the y against the theoretical values from a normal (or whatever distribution) distribution on the x. We can get the y values from our sample and the x values from looking up the p value in a given distribution and getting the corresponding nromal value. Now to get the p value I thought we divide the 1 through length of the sample by n + 1. I have never actually tried to do this and used qqplot or the same geom_qq from ggplot. But today I tried to make the qqplot by hand and I don't get the same scale for the theoretical values and am unsure why. The points look the same but the scale is different. Why?

Code:
x <- c(3.89, 4.75, 4.75, 5.2, 5.78, 5.8, 6.33, 7.21, 7.9)
par(mfrow = c(2, 1))
plot(qnorm((1:length(x))/(1 + length(x))), sort(x))
qqnorm(x)
1522724321437.png
 
Last edited:

bryangoodrich

Probably A Mammal
#3
I'd recommend using axTicks or par('xaxp') to get the axis range of the plot and ensure they're exactly the same on your plot. They look close but maybe the auto ranging your manual plotting is doing isn't exactly what qqnorm is doing.
 

trinker

ggplot2orBust
#5
Thanks @Dason

Code:
myppoints <- function(x, a = if(length(x) <= 10) 3/8 else 1/2) ((1:length(x)) - a)/(length(x) + (1-a)-a) 

par(mfrow = c(2, 1))
plot(qnorm(myppoints(x)), sort(x), xlim = c(-1.5, 1.5))
lines(qnorm(.25), )
qqnorm(x)
qqline(x)
Not sure yet why the adjustments but the paper references in there should give me good direction.
 

j58

Active Member
#6
To generate the probabilities at which quantiles are plotted, I think R's qqnorm function uses the ppoints function, which tweaks your formula witn an adjustment "a". See ?ppoints for details. I think if you adjust your formula by using the appropriate value of "a", your plot will match the qqnorm" plot exactly.
 
Last edited:

Dason

Ambassador to the humans
#7
We could do a quick simulation to see how far off the different methods are compared to the true expected values for a standard normal order statistic.

Code:
sim_normal_quants <- function(n, N = 10000){

  simulated <- rowMeans(replicate(N, sort(rnorm(n))))
  ppoints <- qnorm(ppoints(n))
  trinker <- qnorm((1:n)/(n+1))
  output <- data.frame(simulated = simulated,
                       ppoints = ppoints,
                       trinker = trinker)
  return(output)
}

sim_normal_quants(4)
sim_normal_quants(50)
which gives
Code:
> sim_normal_quants(4)
   simulated    ppoints    trinker
1 -1.0405191 -1.0491314 -0.8416212
2 -0.2944362 -0.2993069 -0.2533471
3  0.2909518  0.2993069  0.2533471
4  1.0371498  1.0491314  0.8416212
> sim_normal_quants(50)
     simulated     ppoints     trinker
1  -2.24390666 -2.32634787 -2.06191650
2  -1.85094300 -1.88079361 -1.75986103
3  -1.62489490 -1.64485363 -1.56472647
4  -1.46074006 -1.47579103 -1.41570209
5  -1.33244967 -1.34075503 -1.29280523
6  -1.21819229 -1.22652812 -1.18683143
7  -1.12003084 -1.12639113 -1.09273583
8  -1.03131298 -1.03643339 -1.00743560
9  -0.94953019 -0.95416525 -0.92889949
10 -0.87458336 -0.87789630 -0.85571243
11 -0.80447467 -0.80642125 -0.78684510
12 -0.73747706 -0.73884685 -0.72152228
13 -0.67309002 -0.67448975 -0.65914304
14 -0.61245929 -0.61281299 -0.59922987
15 -0.55415116 -0.55338472 -0.54139509
16 -0.49748893 -0.49585035 -0.48531773
17 -0.44164767 -0.43991317 -0.43072730
18 -0.38743874 -0.38532047 -0.37739194
19 -0.33526380 -0.33185335 -0.32510971
20 -0.28321250 -0.27931903 -0.27370189
21 -0.23237921 -0.22754498 -0.22300783
22 -0.18081577 -0.17637416 -0.17288083
23 -0.13007357 -0.12566135 -0.12318477
24 -0.07954181 -0.07526986 -0.07379127
25 -0.03051773 -0.02506891 -0.02457726
26  0.02061124  0.02506891  0.02457726
27  0.07036406  0.07526986  0.07379127
28  0.12011423  0.12566135  0.12318477
29  0.17185304  0.17637416  0.17288083
30  0.22371810  0.22754498  0.22300783
31  0.27617981  0.27931903  0.27370189
32  0.32936040  0.33185335  0.32510971
33  0.38328008  0.38532047  0.37739194
34  0.43701258  0.43991317  0.43072730
35  0.49279471  0.49585035  0.48531773
36  0.54995486  0.55338472  0.54139509
37  0.60877160  0.61281299  0.59922987
38  0.66975272  0.67448975  0.65914304
39  0.73349832  0.73884685  0.72152228
40  0.80009623  0.80642125  0.78684510
41  0.87071796  0.87789630  0.85571243
42  0.94686075  0.95416525  0.92889949
43  1.02941918  1.03643339  1.00743560
44  1.11912989  1.12639113  1.09273583
45  1.21781114  1.22652812  1.18683143
46  1.33047860  1.34075503  1.29280523
47  1.46309818  1.47579103  1.41570209
48  1.62647989  1.64485363  1.56472647
49  1.85172464  1.88079361  1.75986103
50  2.25256765  2.32634787  2.06191650
 

Dason

Ambassador to the humans
#8
We could also numerically integrate to get the true expected value but I'm lazy so deal with it.