Representing interval data

#1
I have been assigned the task of helping someone who has collected some survey associated data related to some kind of government facilities provided to people. The data corresponding to small scale entrepreneurs across 3 cities who are the providing the services to the customers looks like this (sample data):

Years of experience of the entrepreneurs in City 1:
0-5 yrs: 15
5-10 yrs: 25
10-20 yrs: 30
>20 yrs: 3

Years of experience of the entrepreneurs in City 2:
0-5 yrs: 10
5-10 yrs: 27
10-20 yrs: 35
>20 yrs: 0

Years of experience of the entrepreneurs in City 3:
0-5 yrs: 5
5-10 yrs: 15
10-20 yrs: 25
>20 yrs: 5

I have to suggest some kind of statistical analysis, like representing the data in box-plot, or somehow showing the avgs using graphical representations. Currently, he uses simple column graphs for each of the cities.
 
#2
Were the categories you listed what was shown to respondents, or were they asked to answer with a number of years, and the results you showed are collapsed into categories?

If the latter, you can't show means. I've seen some people basically take the middle point within each category, but I find that very problematic and would likely lead to a false or misleading result. It's also problematic to determine what value to give to the >20 category. Anyway, I wouldn't advise it.

If you're looking for a statistical test, you could do a chi-square test to determine if there's a difference going on between City and number of years, but that's about it.
 

gianmarco

TS Contributor
#3
Hello!
As suggested by Injektilo, you can build a contingency table with age_class and city as row and column categories. The chi-square test returns a p value 0.09 which is not significant at alpha 0.05. It must be said, however, that some assumptions of chi-square test are not met (20% of the cells have counts lower than 5).

As for 'representing' the data, you could use a mosaic plot (link) (link), or you could plot the standardized residuals to have a visual idea of the 'association' between the levels of the two categorical variables being analysed.

cheers
gm