Help me pick statistical tests to analyze this data (medical)

#1
I'm a Neurology resident working on a project studying socioeconomic and neuropsychiatric (burnout, depression, anxiety, etc) data on physicians compared to general population in my country (quite similar to this study: Burnout and satisfaction with work-life balance among US physicians relative to the general US population.

To do this I created 2 questionnaires, one for physicians, the other one for general population. I exported to excel the answers and did some basic analysis (mostly percentage). Obviously, physicians had poorer results in all tests. I know want to find correlations such as:
  • years practicing an occupation, salary, working hours/week, etc and neuropsychiatric scale results
  • compare those results between physicians and general population
  • etc
I'm quite inexperienced in statistical analysis and this is buy my second study in my career, so I need some help. I want to know which tests should I use. I'm aware of chi², but I read that I should be using also ANOVA and MANOVA, but I still need to get into them. Could anyone point me into what tests I should do research? I can post a sample of my database or the parameters I'm studying if this needs clarification.
 

Karabiner

TS Contributor
#2
Maybe you could do a web search for "statistical tests decision tree" or something like that.
Two example for such trees you can find here.

With kind regards

Karabiner
 

hlsmith

Not a robit
#3
Well make sure you are going to just compare everything possible between the two vocational groups. Post your primary study question and which variable you would examine to answer the question. Do forget to post your sample size and how all the variables are formatted.
 
#4
Very nice site. Those trees are really helpful for me! I'm also using this site to learn: https://researchbasics.education.uconn.edu/


I've got another question I couldn't find by googling. In my questionnaire I had to used ranges for salary (e.g.: $ <12500 / $12501 - 20000 / etc). I was told these "ranges" can't be analyzed statistically but doubt that's real, is there any way to analyze these ranges? Can't I just add another row with numeric values (0 = <12500 / 1 = 12501 - 20000 / etc) and analyze it that way? Wouldn't my ranges work as a categorical variable?
 

Karabiner

TS Contributor
#5
Salary is an ordinal variable then. You could recode it of you like, as long as you stay aware
of the fact that 0 1 2 etc. represent not real numbers but levels on an ordinal scale.

With kind regards

Karabiner
 
#6
In my questionnaire I had to used ranges for salary (e.g.: $ <12500 / $12501 - 20000 / etc
Salary is an ordinal variable then.
Yes, it is an ordinal variable. But it is also an ratio scale variable. (All ratio scale variables, like "weight", are also inteval scaled, ordinal and nominal.)

It is an interval censored variable (it is censored in the intervals of e.g. $12501 - 20000) and it can in principle be estimated by maximum likelihood (provided that a distribution that fits the salary data, can be found). Or on "regression on order statistics". Search for Helsel and the R package NADA.

All continuous variables that are rounded to a few numbers are in principle interval censored. But most of the time it can be ignored.

It is not uncommon to use the class middle in a salary scale like the above. (there are some difficulties in the highest and lowest class). But maybe it is the safest to do as @Karabiner suggest.

But it would have been easier to have asked the respondent directly. But this is what happens when someone is not considering in advance how the data are supposed to be analysed. That is called experimental design and planning.
 
#7
Yes, I know the design is not the best, but this wasn't up to me, frankly. I wanted to just get the real value, but my boss wanted the questionnaire to be easier to respond (e.g. make it multiple choice this salary ranges)
Salary is an ordinal variable then. You could recode it of you like, as long as you stay aware
of the fact that 0 1 2 etc. represent not real numbers but levels on an ordinal scale.

With kind regards

Karabiner
Alright, but then how should I analyze that? Suppose I want to look for a correlation between low income and depression. What should I use if I use this recoding of salaries? Chi², since they now "became" categorical?
 
#8
To make it easier for you to understand the type of data I have, I made this table:



And there's 4 groups of subjects:
1) Health care (for example: secretaries, physicians, nurses, etc)
2) Other occupations
3) Physicians
4) Residents

What I basically want to do is find associations between Burnout AND salary, burnout AND working hours/week, burnout AND age, etc. Same for depression and anxiety. And I also want to compare each groups against another (my hypothesis is that health care personnel have more depression than other occupations, residents have more than other occupations, etc)
 

Karabiner

TS Contributor
#9
Ordinal scale is not categorical scale. Probably level of education, medical rank (!), or working hours
also aren't categorical. I would like to suggest that you perhaps make yourself familiar with the idea
of ordinal scales.

Regarding types of analyses feasable, you could perhaps make a web search for statistical
tests decision tree
or something like that. One example you can find here: https://www.utwente.nl/en/bms/m-store/dataanalysis/

With kind regards

Karabiner
 
#10
Ordinal scale is not categorical scale. Probably level of education, medical rank (!), or working hours
also aren't categorical. I would like to suggest that you perhaps make yourself familiar with the idea
of ordinal scales.

Regarding types of analyses feasable, you could perhaps make a web search for statistical
tests decision tree
or something like that. One example you can find here: https://www.utwente.nl/en/bms/m-store/dataanalysis/

With kind regards

Karabiner
Yes, I'm using those 2 trees, but I guess clearly I can't even understand variables, even though I read about them on various sites and watched YT videos...

For instance, if I want to find a correlation between age and depression, I still can't understand what test to use following those trees. Same goes for sex and depression.
 

Karabiner

TS Contributor
#11
The first thing is the scale on which variables were measured, for example categorical (sex, hair colour, nationality),
ordinal (a.k.a. ranked, ordered categorical, such as responses on a 5-point rating scale, income grouped in ordered
categories, educational level), anbd interval (age; real, not categorized income, blood pressure). And whether you
want to compare independent groups (blood pressure in in-patients versus out-patients), or you want to compare
dependent measures (e.g. blood pressure of the same people on day 1 vs. day 2; rating of satisfaction with therapist A
versus rating of satisfaction with therapist B, both rating made by the same people), or you want to have
correlations.

Problems of association between a categorical variable and other variables you can treat as comparison problem
raher than a correlation problem.
E.g. whether sex and depression are associated, you can see, if the group of males and the group of females
differ with respect to depression. Which test you can use then depends on the measurement scale of depression:
whether it is measured on an ordinal scale (single rating item) or interval scale (depression inventory)
or just categorical (depression yes/no) leads to different tests.

With kind regards

Karabiner
 
Last edited:
#12
Ordinal scale is not categorical scale. Probably level of education, medical rank (!), or working hours
also aren't categorical. I would like to suggest that you perhaps make yourself familiar with the idea
of ordinal scales.

Regarding types of analyses feasable, you could perhaps make a web search for statistical
tests decision tree
or something like that. One example you can find here: https://www.utwente.nl/en/bms/m-store/dataanalysis/

With kind regards

Karabiner
Yes, I'm using those 2 trees, but I guess clearly I can't even understand variables, even though I read about them on various sites and watched YT videos...

For instance, if I want to find a correlation between age and depression, I still can't understand what test to use following those trees. Same goes for sex and depression.
The first thing is tge scale on which variables were measured, for example categorical (sex, hair colour, nationality),
ordinal (a.k.a. ranked, ordered categorical, such as responses on a 5-point rating scale, income grouped in ordered
categories, educational level), anbd interval (age; real, not categorized income, blood pressure). And whether you
want to compare independent groups (blood pressure in in-patients versus out-patients), or you want to compare
dependent measures (e.g. blood pressure of the same people at day 1 vs. day 2; rating of satisfaction with therapist A
versus rating of satisfaction with therapist B, both rating made by the same people), or you want to have
correlations.

Problems of association between a categorical variable and other variables you can treat as comparison problem
raher than a correlation problem.
E.g. whether sex and depression are associated, you can see, if the group of males and the group of females
differ with respect to depression. Which test you can use then depends on the measurement scale of depression:
whether it is measured on an ordinal scale (single rating item) or interval scale (depression inventory)
or just categorical (depression yes/no) leads to different tests.

With kind regards

Karabiner
Alright, seems I'm beginning to understand. So, the way salary was measure in my questionnaire is, then, ordinal.

Burnout, depression and anxiety where measured in an interval scale. I used inventories that give scores for answers, then those scores correlate to "not having depression", being "borderline for depression" or "having depression" (it's called "HAD scale"). It's similar for burnout, but I'm using different scores (no burnout / risk of burnout / complete burnout), but this inventory is a little more complicated as it has 3 subscales. And then I have 2 other areas, which are subjective memory and quality of life which were measured with questions that add to a score (the higher the worse for subjetive memory, the lower the worse for QOL).
So:
- Burnout, depression and anxiety --> Interval scale?
--- I also added a separate column with these scales summarized to "No / borderline / Yes" for each subject, so that may be easier to analize, right? Then I can sort them as "categorical", if I understood correctly.
- SM and QOL --> ordinal scale.



Following the decision trees.
- If I was to analyze sex and depression --> I could use Chi²?
-If I was to analyze sex and QOL --> then I don't know which one to use, because I have sex (categorical) and QOL (scale)
 

Karabiner

TS Contributor
#13
Burnout, depression and anxiety where measured in an interval scale.
Ok. interval scaled then.
--- I also added a separate column with these scales summarized to "No / borderline / Yes" for each subject, so that may be easier to analize, right?
Not as far as I can see. But it depends on your exact research interests/research questions.
Then I can sort them as "categorical", if I understood correctly.
This is borderline in my opinion; one could perhaps consider it as ordinal, but categorical is not wrong.
SM and QOL --> ordinal scale.
I thought it might perhaps be interval, but ordinal is maybe not wrong.
- If I was to analyze sex and depression --> I could use Chi²?
Chi² ist is used if both variables are categorical. If you want to analyse the difference between men and
women with respect to the original depression scale (interval),, then t-test would be the first choice.
If you want to analyse the association between sex and your categorization of depression (yes/maybe/no),
then Chi² looks o.k.
-If I was to analyze sex and QOL --> then I don't know which one to use, because I have sex (categorical) and QOL (scale)
Well, you said QoL is ordinal. Comparison between 2 groups with regard to an ordinal scale
can be performed using Mann-Whitney U-test / Wilcoxon rank sum test

With kind regards

Karabiner
 
#14
Karabiner said:
With kind regards

Karabiner
Thanks, I'm doing the calculations now. I've got another question. Is it ok to use crosstabs with chi² and Fisher exact test for the following:
- Butnout (YEs/NO)
- Salary ranges (< 12500 / 12501-20000 / etc, total of 8 ranges)

and looking at thet "Linear-by-Linear Association" value of p? I did some research and people recommended doing that, but I may be getting some weird results.
 
Last edited: