> Please describe in more detail what these data represent.

Oh I was afraid someone would ask me that

OK ...

I'll try ...

I'm trying to establish a "lexical corpus consistency" approach ...

That is ...to what degree can a corpus of texts be said to be lexically consistent ...

and there are many aspects to this ...

and this is all one ...

There are 90 different texts on roughly the same topic - or ARE THEY???

they are randomly divided into three groups (3*30)

and the "above average frequency" of the words in each group are recorded

NOW

25 words occur in ALL THREE final lists - depending on the three groups they come from

so

Group A = 25 out of 34

Group B = 25 out of 34

Group C = 25 out of 30

(the remaining words occur in either one of the groups or two)

So ... I was going to use a binomial exact test and say

A: 25 out of 34 (given a probability of 0.5) is - more than would be expected by chance

B: 25 out of 34 (given a probability of 0.5) is - more than would be expected by chance

C: 25 out of 30 (given a probability of 0.5) is - more than would be expected by chance

CLEARLY ... in each group - YES ... it IS "above chance" ... but this is a procedure that would have to be used with many of groups of texts

so it does need a stat

YES - I know the REAL "probability" isn't .5

but ... I could argue that it's a "reasonable point of departure" or "assumption"

but - I'm not completely happy with that

so - not completely happy with binomials OR chi-square ...

but - it's all got to be published ...

so - was wondering what people thought