# Can I correlate means and if so what sample size do I need?

#### sangachris

##### New Member
Hello there!

I have a question on how to perform the data analysis for a research project
Basically, I have two types of data from a survey:
For each section of the survey, I have firstly qualitative data (different comments) and secondly, averages from closed-ended scaled questions, which are scaled from 1 to 5. Now, I have a sentiment analysis tool that can score each comment of the qualitative data from 1 to 5 and I want to correlate the averages of that sentiment analysis per section with each corresponding average from the quantitative data. I hope this makes sense. Now if I do that I basically only have 11 data points and I don’t know if that would be enough to perform a real analysis. Also I am not sure if such a comparison between means is even allowed because obviously there are different variances in each of the quantitative sections.
Hope you can shed some light on this issue!

#### Karabiner

##### TS Contributor
If I understand you correctly, there are 11 sections, and for each section you
have the average sentiment and the average item score?

How large was your sample size? And why do you want to use only the
aggregate data, instead of doing such a correlation across the individual
participants?

With kind regards

Karabiner

#### sangachris

##### New Member
If I understand you correctly, there are 11 sections, and for each section you
have the average sentiment and the average item score?
That is correct.

How large was your sample size?
So the survey is a student feedback evaluation. I have one survey for each of two university courses. Each survey has 6 sections and there are on average 20 comments per section.

And why do you want to use only the
aggregate data, instead of doing such a correlation across the individual
participants?
Basically, I do have all of the comments and can score them individually, however, I only have the aggregated item scores, because the data that I got is a summary of the course evaluation.

My hypothesis is that sentiment scores can predict the item scores.

#### Karabiner

##### TS Contributor
Each survey has 6 sections
Not 11?

I suppose that technically you can perform a correlational analysis with n=11 data points.
The statistical power can be low, especially if associations are only weak. You can
consider using the Spearman correlation; its significance test does have less assumptions
than Pearson and is more suitable for small samples.

Mind that you cannot straightforwardly apply findings from group level data to individuals
(ecological correlation fallacy)

With kind regards

Karabiner

#### sangachris

##### New Member
Sorry my mistake: I should have written 12 (so 6 sections x 2 surveys = 12).

Spearmen showed .69 correlation with a .019 significance. What sample size would I require to make my case stronger?

#### Karabiner

##### TS Contributor
.019 is smaller than the usual significance threshold (5%). So you can reject the Null hypothesis, which claims
"in the population from the data are sampled, the correlation is rho = 0.000000... ".

You could maybe construct a 95% confidence interval for your correlation coefficient, just to get an idea about
the possible variability of results.
What sample size would I require to make my case stronger?
Anay increase is an improvement. But this is an academic question here, no? You have
11 (or 12? it's a bit chaotic) sections in your questionnaire and you cannot increase that
number (I suppose).

With kind regards

Karabiner