# Help! Newbie Statistician doing volunteer work

#### Porky Pig

##### New Member
Hello,

I'm working as a volunteer on a patient led project to assess the effectiveness of a newly released medical device and hoping that I may get some help from the experts on the forum here, mainly to see if I am on the right track.

I have a mathematics background and did some stats at Uni but that was quite a while ago. To get myself up to speed I have been furiously reading stats books! I am using excel and Real stats (http://www.real-statistics.com/), which I am finding quite adequate.

Anyway, here is the problem I am addressing:

We have a number of people in our forum who have volunteered to provide feedback on the new device. We ask the users to rate the severity of the condition between 1 and 5 (Lets say Severity scale) and also (optionally) to answer a questionnaire which gives a severity result between 1 and 100 (lets say it is ABC scale). Most people are answering the optional bit (we made it optional as we thought we might otherwise have too many dropouts, but this has not proven to be the case and in retrospect we should probably have made it mandatory!). According to the literature, a change in 13 points or more on the ABC scale is supposed to be a significant change.

We've also asked a heap of questions about various factors, such as Age, the duration of the condition, Gender and quite a few others (about 12 in all). Each factor is divided into mostly 4 groups.

They answer the severity questions at 0,6 and 12 weeks of treatment with the new device, and the various factor questions at 0 weeks.

We will most likely, after drop outs, have data to 12 weeks for about 35 patients (which I know is not a great sample size!). We are about 50% of the way through at the moment.

These are the tests I am doing;

1. One factor ANOVA on the ABC scale deltas at 6 and 12 weeks (delta from 0 weeks) against each of the factors and then Tukey HSD with Bonferroni correction for contrasts (4 of 12 factors are significant to date).
2. One-tailed paired t-tests on the ABC scale at 6 and 12 weeks from 0 weeks. Both are significant (although 6-12 is not and is actually going in the wrong direction).
3. Wilcoxon Signed Rank paired test on the changes in the Severity Scale at 6 and 12 weeks from 0 weeks (again both are significant)

I am also keen to do tests which mitigate being unduly disrupted by outliers. For example, one patient has recorded a drop from about 95 to about 20 on the ABC scale. As such, I am also classifying patients into those that experienced more than a significant drop (13 points) and those that that did not. (i.e. a 1 or a 0). With this data I have used these stats:

1. Fisher Test to determine independence of the variable describing significance and the variable describing each factor. I have used this rather than the chi-squared test s the former supports smaller sample sizes (3 factors significant).

I would also like to use another test which will do the equivalent of the ANOVA, but for the significant/not significant dependent variable (rather than the ABC scale).

I originally investigated Logisitic Regression, which has the advantage that is can be used a predictor, but unfortunately the sample size has to be around 100.
The other test I thought about using was the Kruskal-Wallis + Dunn Test (as the significant/not significant variable is binary it is definitely not Normal). I've done one of these tests and getting reasonable results, but just wondering if this is the right thing to do?

EDIT: After further reading I have my doubts about using KW in this way. Not sure how I could focus on the patients who have achieved a significant change rather than the mean (which could be heavily influenced by one patient).

Anyway, if anyone can help with this project (the Kruskal-Wallis question and the approach in general), it would be much appreciated. It's for a good cause, the condition can be debilitating, and it's run by patients rather than medical companies!

Last edited:

#### GretaGarbo

##### Human

We ask the users to rate the severity of the condition between 1 and 5 (Lets say Severity scale) and also (optionally) to answer a questionnaire which gives a severity result between 1 and 100 (lets say it is ABC scale).
So you have two dependent variables, severity and ABC scale.

about various factors, such as Age, the duration of the condition, Gender and quite a few others (about 12 in all).
So you have a number of explanatory variables.

Use them in a multiple linear regresion model. You will probably need to skip many variabels that does not have any explanatory power. But maybe four or five can be used in a preferred model.

Each factor is divided into mostly 4 groups.
Please don't do that. That will cause bias and measurment errors. Keep the original values.

They answer the severity questions at 0,6 and 12 weeks
So there is also repeated measures.

So you will need to have "time" as an explanatory variable, but as "random intercept" variable in a mixed model.

one patient has recorded a drop from about 95 to about 20 on the ABC scale.
It could be an outlier or not be one.

for about 35 patients (which I know is not a great sample size!).
Really? I know someone who had 7 patient and got good results.

Logisitic Regression, .... but unfortunately the sample size has to be around 100.
Who said that nonsence? Of course you can have fewer, but everything is estimated with a variance.

#### Porky Pig

##### New Member
Thanks GretaGarbo,

Yes, you are right. Maybe I should have just asked if someone can donate 30 minutes of their time on a Skype call.. If someone can., that would be very much appreciated!

So you have two dependent variables, severity and ABC scale.
Yes

So you have a number of explanatory variables.

Use them in a multiple linear regresion model. You will probably need to skip many variabels that does not have any explanatory power. But maybe four or five can be used in a preferred model.
I think for Linear Regression (or multiple linear regression) the variables need to be continuous? Mine are mostly nominal (categories with no order) or Ordinal (categories with Logical order or Rank Order).

Please don't do that. That will cause bias and measurment errors. Keep the original values.
That's how the questionnaire was written. I get responses in four categories.

Who said that nonsence? Of course you can have fewer, but everything is estimated with a variance.
Here:

https://www.researchgate.net/post/W..._sample_size_for_running_logistic_regression2

#### GretaGarbo

##### Human
Maybe I should have just asked if someone can donate 30 minutes of their time on a Skype call..
No, you should not. We would just ask you how much you would pay, and then we would ignore you. By the way, the idea with this site is that everybody can read and learn. It is not a consultancy site. (But maybe it would be profitable?)

I think for Linear Regression (or multiple linear regression) the variables need to be continuous?
Well, they need to be "quantitative". A number of events, like 1, 2, 3, 4 are discrete and not continous and can be used in linear regression.

Mine are mostly nominal (categories with no order) or Ordinal (categories with Logical order or Rank Order).
If your software is any good it will create dummy variables when you declare them to be "categorical" or something.

That's how the questionnaire was written. I get responses in four categories.
It would have been better to have the actual age and not "young" "middle" "old" etc.

That is just Wrong! Don't trust "Researchgate". But that goes for most sites. Including this one.