Significance of Correspondence Analysis clusterings

gianmarco

TS Contributor
#1
Dear All,
I would like to ask you about Correspondence Analysis.

I am wondering if does any strategy exist to test the significance of the groupings revealed by CA. I understand that chi-square test can be applied to verify if a statistical significant association exist between rows and columns (taking into account the total inertia multiplied by the sample-size, and the associated degree of freedom).
This informs us about the overall association.

But, moving from a exploratory framework to a hypothesis testing perspective, how can one pinpoint which profiles are significantly different in a statistical sense (e.g., to test if a clustering can be considered significant).

I have read Greenacre, "Correspondence Analysis in Practise" (2008), Chapter 15. He interestingly talk about Ward clustering performed on the basis of rows/columns profile and associated masses. I would like to know if did anyone performed this approach?

Any comment on the general issue, or on the aforementioned reference and related technique, is welcome.

Thank you
Kind regards and happy new year

Gianmarco
 

terzi

TS Contributor
#2
Hi gianmarco,

In chapter 15, Greenacre discuss somemethods for collapsing categories, that is, merging some categories together. As far as I remember, there is no inferential procedures involved. This method is commonly used in some segmentation procedures, such as CHAID Analysis.

Now, if I understood well, you are looking for some way to test whether a relationship between a particular row or column is significant. I don't know if that's what you are trying to do, but as far as I know that goes beyond Correspondence Analysis. Try reading some about Log Linear Models for Contigency Tables. This methods may give you some information related to inferential tests.

Hope this helps
 

gianmarco

TS Contributor
#3
Hi Terzi,
thanks for your reply.

You do well understand my issue. I will try to get some info about Log Linear Models: I am totally new with this and I fear that I have to start from a very basic level. Anyway, thank you very much.

As for Greenacre's Chapter 15: that chapter and chapter 25 deal with the issue of stability and inference. Chapter 15 points out an interesting analysis which takes into account the merging of rows/columns in order to pinpoint where a significant statistical association is. It proposes, among other things, a weighted Ward clustering. It talks about the use of XLSTAT and suggest something that I did not grasp (see the very end of the chapter, at page 120). I understand that the chi-square distance and the masses have to be taken into account (obviously, taken from CA outputs), but I do not manage to figure out how to put the whole thing into XLSTAT.

Any idea is welcome.

Thank you again.

Best Regards,
Gm
 

terzi

TS Contributor
#4
Hi

Hi again gianmarco,

Since I have the spanish version of the book, the pages don't match. In mine, chapter 15 starts at page 155:p. Nevertheless, the inferential procedures referred at the end of the chapter are meant to compare rows and columns separately. That is, is a procedure to infer about the statistical difference between a particular row A and another row B, i.e. difference between rows. To do this, all you have to do is to calculate the chi-square statistic of the reduced table, with only the rows/columns you wish to compare. This way you can also test differences between columns.

I understood that you want to test rows against columns, so it is different. Remember that CA is a descriptive tool, and for testing hypothesis you usually use models (certainly not always). As I stated before, I'm not aware of a method to do this using CA:(. If any other member knows of some recent developments, it would be really interesting to read about them.

I haven't read the last chapter of the book, but after a peek, it seems it discusses tests on the the components, and relating the overall inertia, which is a different topic.

Good luck
 

gianmarco

TS Contributor
#5
Hi Terzi,
thank you for your reply :).

Comparing groups of rows (or columns) is exactly what I want to do :tup:. Thanks for summarizing Greenacre's procedure and for the approach you pointed out in your earlier post.

I am sorry if my bad English led you astray.

Best regards,
Gm