I'm quite concerned about the applicability of some non-parametric tests in the following example because it is adapted from a defended doctoral dissertation, yet there are (as I understand) technical flaws. My questions follow the example.

It is a within-subject design with binary count data. A sample of 30 children are observed in a task under both control and experimental conditions. Under each condition, whether a subject makes a correct response in each of the 10 trials is recorded, and the number of correct responses (hits) for each child is calculated.

The research questions are: Q1a. whether the children make more hits under experimental condition than chance level, as most of them make at least 6 hits, with a total of, like, 240/300 trials "hit"; Q1b. similarly, whether they make fewer hits under control condition than chance level, with 120/300 trials "hit"; Q2. for each child, whether (s)he is more likely to "hit" under experimental condition than under control, for example, Subject 1 making 8 hits/2 misses under exp., and 5 hits/5 misses under control.

It seems that for Q1a and 1b, a parametric (and better) way is to calculate each one's accuracy and compare the mean accuracy to 50% with

*t*test. However, there might be too few trials (10) for each subject to treat accuracy as a continuous variable. Then comes binomial test, with observed frequencies being 240/60 or 120/180 and p=q=0.5. But the data do not come from 300 subjects, and instead, each subject was observed multiple times, within and across cells. I think underlying assumptions for binomial test are violated.

And for Q2, though the instinctive choice is Fisher's exact test, a similar concern arises because the frequencies result from multiple observations on the same subject.

The author not only did the tests underlined above, but also calculated the mean number of hits across all subjects under each condition (240/30=8 and 120/30=4) and did two additional binomial tests to see if "the average child makes more/fewer hits than at chance level under each condition". This does not seem to fit into what I have learned at stats classes.

And my questions are as follows. The first two are about research questions 1a and 1b.

1. Using accuracy (%) as a dependent variable, usually how many trials do we need for each subject so we can treat accuracy as a continuous variable?

2. If there were really too few trials, and non-parametric tests have to be used, which is the best way to answer RQs 1a and 1b as in my case, with multiple observations for each subject? I did a little research and communication and it seems that Somers' D, Huber variance and Generalized Linear Mixed Modeling or GLMM are appropriate, with the latter two said to be better. I did see a study analysing a dataset of similar structure with GLMM, so is there any occasion when Somer's D is preferred?

Then, 3. at the individual level, what's the best way to answer RQ2? If Fisher's exact test is not acceptable here, the only way seems to calculate Somer's D for each subject.

Finally, 4. when use chi-square tests for frequency data, is it definitely unreasonable to fill averaged frequencies rather than raw observations into the cells, as the author did for "the average child"?

Thanks in advance for your help.

Meng