Is the difference significant?

#1
I'm using stata since only a few weeks for my thesis. I can use some help regarding the following:
I wanted to know what the mean grade is for different categories of hours of work (tijdwerku). Therefore I did the following, as this last variable was not yet divided in categories:
sum tentcijf if xjaars==1 & tijdwerku ==0
sum tentcijf if xjaars==1 & tijdwerku >0 & tijdwerku <=5
sum tentcijf if xjaars==1 & tijdwerku >=6 & tijdwerku <=10
sum tentcijf if xjaars==1 & tijdwerku >=11 & tijdwerku <= 15
sum tentcijf if xjaars==1 & tijdwerku >=16 & tijdwerku <= 20
sum tentcijf if xjaars==1 & tijdwerku >=21
This gave me different means of grade (tentcijf).
However, now I want to see/know whether these means are significant different from each other. It makes it difficult that tijdwerku (hours work) has no subgroups itself I think? Does anyone know how I can find out?
Many thanks in advande
 

bukharin

RoboStataRaptor
#2
The classic test to compare the means of different groups is analysis of variance (ANOVA). The data should be normally distributed.

You should recode the continuous variable in groups. It sounds like tijdwerku is always an integer, so:
recode tijdwerku (0=0 "0") (1/5=1 "1-5") (6/10=6 "6-10") (11/15=11 "11-15") (16/20=16 "16-20") (21/max=21 "21+"), gen(tijdwerku6cat)
(that's all one line)

Then you can check the means with:
mean tentcijf if xjaars==1, over(tijdwerku6cat)

And compare the means with:
anova tentcijf tijdwerku6cat if xjaars==1

You generally get more information by doing this kind of analysis as a regression, eg:
scatter tentcijf tijdwerku if xjaars==1
regress tentcijf tijdwerku if xjaars==1

Also note that this line of yours contains a subtle but important error:
sum tentcijf if xjaars==1 & tijdwerku >=21

This will give you the summary statistics of tentcijf if xjaars==1 & tijdwerku >=21; BUT, if tijdwerku is missing then "tijdwerku >=21" will also be true. This is because "missing" in Stata is considered to be "a very very very high number". Therefore such lines should always be written like this:
sum tentcijf if xjaars==1 & tijdwerku >=21 & !missing(tijdwerku)
 

Dason

Ambassador to the humans
#3
The classic test to compare the means of different groups is analysis of variance (ANOVA). The data should be normally distributed.
You have just slipped a little but we actually don't care at all what the data looks like. What we're interested in is if the errors are normally distributed which is assessed by looking at the residuals.