# Calculate Standard Error over averaged responses or raw responses?

#### RyanB

##### New Member
Hi,

I would like to calculate the standard error of my Dependent Variable. However, I have psycholinguistic data, which uses multiple ppts and multiple items. Thus, there are 2 ways of doing this: either by calculating SE over averaged responses (example 1) OR by calcualting SEs over raw responses (Example 2)

1. Average over items by participant (as in Table 1). Then calculate mean and SE from this table.
Table 1.

..which gives mean 62.67 and SE = 16.63

2. List each observation on its own row (as opposed to average observation by ppt) - as in Table 2.
Table 2:

Because N is greater in example 2, the SE will be different (Mean =62.67 ; SE =17.3 ). The example I give is very simple, but the fact you can get different SEs is important - especially if I were using SE to compare means in two different conditions. My question is, which method is best for calculating the Standard Error?

Many thanks! Altohugh this is a simple Q, I really appreciate any insights.
Ryan

#### Attachments

• 7.4 KB Views: 2
Last edited:

#### obh

##### Well-Known Member
Hi,

First, if this is a sample data you need to calculate Sample Standard Deviation using (n-1) instead of n.

What is PPT? anyway if you want to calculate the standard deviation of a dependent variable you need to use all the data (table2)

Mean:62.66667 S:18.371173

#### RyanB

##### New Member
Thank you for your response - it makes sense! PPT refers to participant (i.e., PPT 1 is data observed from first participant ,whereas PPT 2 is data from second participant, and so on). I expected this was the answer, but it is interesting because a lot of standard text books don't discuss this issue: they simply present the summary data which averages a response for each participant (i.e., they have already averaged over items, as in example 1), and calculates SE from there (as in Example 1, above) - but I understand that would be the wrong approach and that the approach used in Example 2 is best practice. It would be good if such textbooks provided a footnote to explain this.

Many thanks,
Ryan

#### obh

##### Well-Known Member
Do you mean all the observations of PPT1 are for the same person? say PPT1 - person1, PPT2 person2 etc ?

#### RyanB

##### New Member
Yes, that's correct. Is example 2 still the best way to calculate SE?

#### RyanB

##### New Member
OK so it's example 1? Or is there even a 'correct' choice? Example 2 might be more appropriate because it doesn't throw any data away i.e., it is calculating variance of every observation from the average. On the other hand, if we aren't allowed to use multiple responses from the same participant, then Example 1 would be more appropriate.

#### obh

##### Well-Known Member
Okay, now I understand your question ...

let's go to the edges:

example A. If this is repeated sample:
PPT1, condition A, item1, 33
PPT1, condition A, item1, 31
PPT1, condition A, item1, 35

In this case, it makes sense to take first the averages then calculate the standard deviation of the averages. Table1.

example B. If this would be a different random sample from the population, different PPTs , in this case, it makes more sense to use Table2.

PPT1, condition A, item1, 33
PPT2, condition A, item2, 31
PPT3, condition A, item3, 35
PPT4, condition A, item4, 33
PPT5, condition A, item5, 31
PPT6, condition A, item6, 35

In your example, I assume both options are statistically correct.
Now the question is "what do you want to show?"

If you expect a similar result for any item and condition, and the variance is mainly because of the person it is more like example A.
Then the standard deviation will be by PPT.

If you expect a different result for any item or condition it is more like example B.

#### obh

##### Well-Known Member
OK so it's example 1? Or is there even a 'correct' choice? Example 2 might be more appropriate because it doesn't throw any data away i.e., it is calculating variance of every observation from the average. On the other hand, if we aren't allowed to use multiple responses from the same participant, then Example 1 would be more appropriate.
I updated my answer, is it clear now?
"Table1" doesn't throw any data away, it just uses it in a different way

Both options are "correct" it is the question what do you want to show.

Another simple example of tests results:

subject,#students, mark
history,1000,80
Math,10,90

The average of any test at school is (1000*80+10*90)=80.099
The average of the subject mark is (80+90)/2=85

Both averages are correct, it is only the question of what do you want to show

Last edited: