I'm analysing two categorical variables, Landslides (YES/NO) and Vegetation (grouped into 18 classes), and I'd like to explore and model the relationship between them.

Firstly, I've carried out a chi-square test for independence that has returned a positive result, i.e. the two variables are not independent, rejecting the null hypothesis.

Secondly, I would like to know if one of the variables is dependent on the other, and if so, how can I describe and model this relationship? My first though here was to carry out a logistic regression analysis of the two variables, testing the two variables against each other to see which returns a more "successful" result.

My doubts about this procedure are as follows.

Firstly, the chi-square contingency table had various cells with expected frequencies of less that 5 (14.7% of the cells to be exact). From reading around, some authors say that this value should be less than 20%, however I've also read that in cases like these the Fisher Exact Test would be better. Should I do this first to see if my results are reliable?

Secondly, the Vegetation variable is grouped in 18 classes. Is this too many? I could combine some classes if need be.

Thirdly, if the variables are not independent, what should be the next step? How can I know which variable is dependent and how should I test this?

Many thanks in advance!!

Matt

ps. While I wait for an answer, I'll start looking at the Fisher test.