Calculating Chi-Square/Fisher exact Test with differing (sub-)sample sizes and interpreting them (retrospectively)

#1
I`m sorry, I realise how long this is! I tried to hightlight the questions in bold, but I don`t know how helpful it would be to ask them without providing background information first. My deepest gratitude to anybody who is still willing to read through this =)

Me reading a recently published paper that I thought might be interesting has turned into something of a statistical mini-project for me. I am trying to understand the statistical analysis of the data which has basically been published in full. So I thought I could retrace the steps of the analysis in order to understand it more clearly but so far, no luck.

I`m not sure if it`s allowed to post the doi (it`s a free access paper) so I will try to explain it as best I can myself:

The general theme is the classification of "swallowing disorders" in different neurological illnesses; more precisely, the authors formulated 7 specific symptoms (phenotypes) of swallowing and said their aim was to validate this classification system in a cohort of different patient groups (Alzheimers, stroke, ...). N was 1-5 for a few of these sub-groups.
This was done by rating video materials of 1012 patients` swallowing assessments-> the rater classified each patient into one of the 7 phenotypes. Since the video material was randomly chosen, in some patients no swallowing disorder was present, so they actually assigned each patient to 1 of 8 categories, the 8th being "no disorder".

They say "the distribution of dysphagia phenotypes depending on the underlying disorder was evaluated in a cross table (observed counts) and matched with the expected counts according to Chi-Squared statistics [for each patient group with n≥ 5...] in cases of expected counts <5, the Fisher exact test was used."

-> OK, so I read up on the Fisher/ Chi-Squared Tests; they used the Chi-Square Test of Independence, which analyses whether or not there is a significant association between 2 categorical variables, i.e. whether specific dysphagia phenotypes (incl. "no disorder") are significantly associated with specific patient groups. I also read that the Fisher exact test is often used for a 2x2 contingency table.

Now, the result section is a bit short in terms of written results, they basically say there is a table A with "the exact counts of phenotypes as well as the expected counts according to the Chi-Squared statistics throughout the different diagnostic groups". This is about distribution, according to the authors.
And a table B with "results of the Chi-Squared or Fisher exact tests for the relation between phenotype and each disorder as well as sensitivity and specificity to predict the underlying disease for each phenotype". This is about association, according to the authors.

So, the table A includes the raw data for each group, (excerpt) :
Table A.PNG

I`ve been able to put the data into my own spreadsheet to calculate the expected values based on Chi-Square and they seem to fit the authors` data. But then I was questioning myself whether Chi-Square is actually admissable here, because in some of these subgroups n=1, so there are definitely a lot of expected values under 5. I searched, whether I could determine expected value through the Fishers exact or any other way, but couldn`t find anything yet (Help?)


And then... there is table B (excerpt) :

table B.PNG

This table is just a conundrum to me. It is the "p value of the Chi-Square test of Independence or in case of expected counts < 5 of the Fisher Exact Test and Sensitivity and Specificity [...]". I won`t even try to understand the sensitivity/specificity thing for now.
As you can see, every pair of phenotype & disease sub-group has its own p-value. Based on what I`ve read about the Chi-Square test, I would have thought, there is one p-value for a whole column-row interaction (and also something on df and the X2-value itself).

I´ve tried the steps through the Chi-Square test (excerpt from Alzheimers):
picture spreadsheet.PNG
but this is only possible for some of the subgroups (n>5) and also, it would give me one Chi-Square/p-value for the whole subgroup/group and not for every single pair of the data. So what is the starting point to get all these different p-values?

-> I`m not even sure, how to interpret the p-values at this point; Alzheimer is significantly associated with Phenotype I but not II, III etc? But then most of the Alzheimer patients had "no disorder" in Table A (which is not presented in Table B; doesn`t that mean, that the N is reduced by the 12 "normal swallowing" patients, which would mean I need the Fisher Exact Test instead of the Chi-Square, because the new N=4, and there are multiple "zero frequencies")?
Also, how does this link back to the aim of validating the classification system? As far as I understand it, the aim of this statistical analysis is not validation to any outside information on subgroup or phenotype but to analyse occurrence within a subgroup. Or maybe this is where I`m thinking wrongly?


Also, how can I start analysing the n<5 data with the Fisher Exact Test: it`s either a 1x7 within subgroup, or the whole 25x8 (across subgroups) contingency table- in my head at least (Help?)
 

Attachments