testing the association between points and any of the points belonging to another set of point: am I approaching it correctly?

gianmarco

TS Contributor
#1
Hello,
I am trying to put togheter an R function to test if points in a given area tend to occur closer to any point belonging to another set of point, or if they are distributed irrepsective of the distance from the latter.

I would like to have some feedback on the method I applied, especially on the use of the binomial distribution.

With reference to the attached image, let's imagine we have some locations (crosses) and some other locations (red dots). For the sake of argument, let's assume the crosses are water pumps, and the dots the location where a disease is recorded as present.
The question to address might be: do diseases tend to occur in the vicinity of any of the water pumps?
The method:
  • equally divide the area around water pumps using Thiessen polygons; each disease falling in each polygon will be closer to the polygon source (i.e., water pump) than to any other source;
  • calculate the percentage of the area of each polygon in relation to the total area covered by the polygons (let's call it %area);
  • count how many diseases falls in each polygon (let's call it bypoly.points);
  • calculate the expected number of diseases in each polygon; it should be (if I am not mistaken) equal to the total number of diseases (i.e., total number of points in the whole study area) times the %area;
  • calculate the probability of the observed count in each polygon: dbinom(bypoly.points, tot.n.points, %area)
  • calculate the probability of observed diseases <= than expected: pbinom(bypoly.points, tot.n.points, %area)
  • calculate the probability of observed diseases >= than expected: 1-pbinom(bypoly.points, tot.n.points, %area)
The results I have got are:
Code:
     polygon.area %area obs.n.points exp.n.points   p.obs p.<=exp p.>=exp
[1,]      4466753  0.14           17         3.51 0.00000 1.00000 0.00000
[2,]      5845596  0.18            0         4.59 0.00628 0.00628 0.99372
[3,]      6105211  0.19            4         4.79 0.19566 0.46190 0.53810
[4,]      8533652  0.27            2         6.70 0.01648 0.02064 0.97936
[5,]      6888852  0.22            2         5.41 0.05154 0.06935 0.93065
The image I attached is showing some relevant info derived from the results above.
The results seem to make sense, at least to me. But I would very like to have feedbacks, especially with regards on the operation on probability.

Gm
 

Attachments

Dason

Ambassador to the humans
#2
I'm not exactly sure how you're using the results from the binomial distribution here. But you're able to calculate observed number of points and expected number of points based on the area percentages. If your question basically is if the observed points have the same distribution into polygons as you would expect based on the areas then why not just use a simple chi-square test?
 

gianmarco

TS Contributor
#3
I come up with the approach on the basis of O'Sallivan-Unwin, "Geographic Information Analysis", 2010, p. 104, who say:
"...the probability p in the quadrat counting case is given by the size of each quadrat relative to the size of the study region.....This gives us the final expression for the probability distribution of the quadrat counts for a point pattern generated by Independent Random Process....which is simply a binomial distribution with p=1/x [in case of quadrats of equal size]....."
Now, my main issue is whether I am right in thinking that, while for equal-sized polygons (quadrats) p=1/x, for unequal-sized polygons p=size of each quadrat relative to the size of the study region.
 
#4
Now, my main issue is whether I am right in thinking that, while for equal-sized polygons (quadrats) p=1/x, for unequal-sized polygons
Yes, I think you are right.

If you have an area where there is uniform density for an event (like a point on the x,y coordinate) and you have a small part of that area, irregular polygon or not, then the probability for say k events in the polygon will be binomial distributed out of n events (given that there has been n events in total in the total area) with a probability p which is proportional to the polygons share of the total area.

There is a fun way of estimating pi (the 3.14... number) in that way.