Chi squared on binary presence/ absence data?

Hi, I have a list of samples (10) and a list of compounds (about 250). I have generated a presence absence table in which the first column is the list of compounds and the next 10 columns are my samples 1-10; a cell is marked with a "1" if a compound is present in that sample and "0" if it isnt.

I'm trying to determine if the differences in the chemical composition of my samples is statistically significant and my supervisor recommended I do a chi squared test on the data. I have SPSS and PRIMER 6. I really have no idea how to go about this. I have generated some similarity matrixes in PRIMER and got a dendrogram of my 10 samples but I really need to be able to say whether or not the differences in the chemical compositions of these samples is statistically significant.

Any ideas?

Can anyone help me with this?


TS Contributor
Hello back!
In a earlier reply of mine (in a different thread of yours) I suggested to use Correspondence Analysis. I understand that you may be not familiar with CA, but it surprises me that your supervisor suggests chi-square and, at the same time, he/she is not aware of CA whose underlying logic mainly rests on chi-square and deviation from independence.

Now, I believe that using chi-square is not sound in your case because you have a large contingency table and locating which cells significantly deviated from independence would be time-consuming (to say the least). I mean, suppose you perform the test (provided that your table comply with chi-s test assumptions) and it would say 'yes there is a significant dependence between samples and compounds', you should then pinpoint where the significant difference (from independence) are. You should inspect the table of standardized residuals, and you should eyeball a 250x10 table. Is not that time-consuming?!

Secondly, using chi-q test would not address the research question you described in your earlier thread (i.e., locating clusters or, put another way, seeking for similarities between samples).

Again, a dimensionality-reduction technique is what I think you should look after. CA would possibly work.

Hi Gian,

Yes I e-mailed my supervisor after I spoke to you and he seemed to suggest that a Chi squared test would be the best option for me -- perhaps he hasn't any CA experience. I'm in a bit of a situation though because I have to present my final seminar in about 36 hours and we changed the direction of the project only yesterday. So it's a bit crazy for me, especially with my poor knowledge of statistics.
At least, you could also try MDS, as per your first thread title....
I generated a resemblance matrix in PRIMER 6 and the end result was a 10x10 matrix with a value in each cell representing the distance from "0 to inf". Most of the values lie within 6-8, and I have no idea what those numbers really mean in terms how how similar the data is. I have attached an image of the resemblance matrix and the window you can see on the right is what comes up when I want to do an MDS on this resemblance matrix. Not really sure what it means!


TS Contributor
I am not familiar with Primer. I seem to recall that is is commercial (i.e., paid). For the future, you could try PAST which has many functions (as many as Primer I guess).

As for MDS (which I am not so familiar with), it works on a similarity matrix. I mean, first the program calculate a similarity matrix from your data and then MDS seeks to visually represent the similarity of samples (in your case) as distance on an euclidean plane. Usually, the choice of the way in which the similarity matrix is calculated depends on many factors, among which the type of data you have (presence/absence, frequencies, etc).
There are many similarity measures, among which Euclidean Distance. I seem to recall that with presence/absence data, Jaccard coefficient would be a sound similarity coefficient.

I would limit myself to the above, since my knowledge of MDS is just more than basic.