# Chi Square question

#### Philip

##### New Member
Hi

If I have results that are X for 24 out of 39 examples
and results that are Y for the remaining 15
Then can I use a chi square (2*2) to demonstrate that - yes indeed - 24 out of 39 is more than (the remaining) 15 out of 39
I just feel weird about that as the two groups don't seem to be independent
Is there another test I should do?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Please describe in more detail what these data represent.

I would calculate the proportion of with the event and calculate a confidence interval around it (binomial data). But without more information I am just guessing at the context and objective.

Say we were talking about males vs females in a sample. if 75% were female, well I don't think you need any fancy stat to tell you more unless you were trying to infer back to a population.

#### Philip

##### New Member
> Please describe in more detail what these data represent.

Oh I was afraid someone would ask me that
OK ...
I'll try ...

I'm trying to establish a "lexical corpus consistency" approach ...
That is ...to what degree can a corpus of texts be said to be lexically consistent ...
and there are many aspects to this ...
and this is all one ...

There are 90 different texts on roughly the same topic - or ARE THEY???
they are randomly divided into three groups (3*30)
and the "above average frequency" of the words in each group are recorded
NOW
25 words occur in ALL THREE final lists - depending on the three groups they come from
so
Group A = 25 out of 34
Group B = 25 out of 34
Group C = 25 out of 30
(the remaining words occur in either one of the groups or two)

So ... I was going to use a binomial exact test and say
A: 25 out of 34 (given a probability of 0.5) is - more than would be expected by chance
B: 25 out of 34 (given a probability of 0.5) is - more than would be expected by chance
C: 25 out of 30 (given a probability of 0.5) is - more than would be expected by chance

CLEARLY ... in each group - YES ... it IS "above chance" ... but this is a procedure that would have to be used with many of groups of texts
so it does need a stat

YES - I know the REAL "probability" isn't .5
but ... I could argue that it's a "reasonable point of departure" or "assumption"

but - I'm not completely happy with that

so - not completely happy with binomials OR chi-square ...
but - it's all got to be published ...

so - was wondering what people thought