# making R computing large factorials

#### gianmarco

##### TS Contributor
I am writing a function in R to test whether a set of points falls inside a set of polygons more often than would be expected by chance. The calculation of p value entails using factorials:
C-like:
p.x <- round((factorial(n.of.points)/(factorial(x)*factorial(n.of.points-x)))*(p^x)*(q^(n.of.points-x)),5)
Now, it may happen that either 'n.of.points' or 'x' (or both) can be well above the limit of 170 (or 270, I do not recall) for the factorial function to be used. How can I still implement that calculation in R?

Last edited:

#### Stats fan

##### Member
p.x.log <- function(n.of.points, x, p, q) {
log.result <- sum(log(n.of.points:1)) - sum(log(x:1)) - sum(log((n.of.points-x):1)) + x*log(p) + (n.of.points-x)*log(q)
exp(log.result)
}

#### Dason

And q is just 1-p? Why not just use dbinom?

#### gianmarco

##### TS Contributor
p.x.log <- function(n.of.points, x, p, q) {
log.result <- sum(log(n.of.points:1)) - sum(log(x:1)) - sum(log((n.of.points-x):1)) + x*log(p) + (n.of.points-x)*log(q)
exp(log.result)
}
Thank you for the suggestion. It worked.

#### gianmarco

##### TS Contributor
And q is just 1-p? Why not just use dbinom?
Thank you for the hint, it looks way simpler!
May I ask you to give me some help in adapting that function to to my context? I have made some web search after your suggestion, and I realized that perhaps pbinom() should be used.
Now, what I am trying to implement is what follows, which is after the help manual of a spatial analysis program:

The inside mode tests to see whether a set of points falls inside a set of polygons more often than would be expected by chance. It assumes that the polygons represent a subset of the study plot, that is, the polygons are completely contained within the study plot. If As is the area of the study plot and Ap is the area of the polygons, the probability that a single point will be found within a polygon is simply p = Ap / As (assuming that point locations are random with respect to the polygon locations). The probability that a point will be found outside of the polygons is q = 1 – p. For n total points, the expected number of points within the polygons is simply np. This system can be described by a binomial distribution. The probability of observing a specific number of points (x) within polygons is therefore (see attached screenshot) The probability that x or fewer points will be found within the polygons is the sum of the above equation for counts from 1 to x. No permutation test is required because the expected distribution can be exactly determined.
Now, from this webpage (http://www.r-tutor.com/elementary-statistics/probability-distributions/binomial-distribution) I see an example of the use of pbinom(). I am wondering which parameter should I plug into the function?
My guess is below:
C-like:
pbinom(observed number of points, size=?total number of points?, prob=p)
I am uncertain about the 'size' parameter.

Last edited:

#### GretaGarbo

##### Human
I believe that you need dbinom(). ("d" for density.) pbinom gives you the cumulative distribution function.

Code:
# dbinom(x, size, prob, log = FALSE)

dbinom(2, size=4, prob= 0.5)
# 0.375

dbinom(0:4, size=4, prob= 0.5)
#  0.0625 0.2500 0.3750 0.2500 0.0625
x is the number "heads" and size is the number of coins tossed.
So I had (dbinom(2, size=4, prob= 0.5)) 2 heads and tossed the coin 4 times and probability of head=0.5.

#### gianmarco

##### TS Contributor
I believe that you need dbinom(). ("d" for density.) pbinom gives you the cumulative distribution function.

Code:
# dbinom(x, size, prob, log = FALSE)

dbinom(2, size=4, prob= 0.5)
# 0.375

dbinom(0:4, size=4, prob= 0.5)
#  0.0625 0.2500 0.3750 0.2500 0.0625
x is the number "heads" and size is the number of coins tossed.
So I had (dbinom(2, size=4, prob= 0.5)) 2 heads and tossed the coin 4 times and probability of head=0.5.
Greta, I think I need the cumulative distribution; note what follows (copied from my earlier thread, emphasis mine):
The probability that x or fewer points will be found within the polygons is the sum of the above equation for counts from 1 to x. No permutation test is required because the expected distribution can be exactly determined.
So, if I have x events (points), at the best of my (limited) understanding I need the cumulative prob for events from 1 to x.

#### GretaGarbo

##### Human

So, if I have x events (points), at the best of my (limited) understanding I need the cumulative prob for events from 1 to x.
You have n points, x of them falls into your polygon so that you have 0, 1, 2, 3 to x within polygon. x successes out of n.

I should say that I have not read carefully and I have not understood what you are really aiming to. (Just trying to contribute.)
But why should the points have a uniform distribution over the area investigated (so that there is a binomial distribution of the number of event within your polygon)? (I don't know what each spot is, lol.) If these spots are units of archaeological investigations spots, then I guess that you are not investigating at random, but rather where you believe that it will be the most successful, right?

#### gianmarco

##### TS Contributor
@GretaGarbo: just trying to reproduce in R the points-in-polygons test to which I was referring in my first post. See the quoted text and the attached image.

#### GretaGarbo

##### Human
Now, it may happen that either 'n.of.points' or 'x' (or both) can be well above the limit of 170 (or 270, I do not recall) for the factorial function to be used. How can I still implement that calculation in R?
Remember that as "n" gets large so that n*p is large, the binomial is approximated by the normal distribution. The rule of thumb used to be n*p>5. So if p is "in the middle" (0.2<p<0.8) and n is larger than 10, it is well approximated by the normal. (Of course mu=n*p and variance =n*p*(1-p).) Then there need not to be any numerical problems with large n.

#### gianmarco

##### TS Contributor
Thanks Greta. For the time being, I would like to stick with Dason's suggestion and using the pbinom(). So, given the parameters I have cited in the quotation in one of my earlier posts, is the way I am plugging the parameters in correct?
pbinom(observed number of points within polygons, size=?total number of points?, prob=p)

Last edited:

#### GretaGarbo

##### Human
So, given the parameters I have cited in the quotation in one of my earlier posts, is the way I am plugging the parameters in correct?
pbinom(observed number of points within polygons, size=?total number of points?, prob=p)
Yes.