Help with our school project, a statistical survey on education


The students of statistics in my high-school have to do a project. The topic we have chosen is Educational Performance in Kolkata Schools and Its Relation to Gender and Gender Interactions. I outline our goals below:

1)Collect data about Madhyamik results (a centralised exam organised by the state education board to pass the 10th grade). We will sample marks from all-boys schools, all-girls schools and co-educational schools. We have selected the schools to be of similar average educational performance. The average socio-economic condition of the students is also approximately the same, hopefully eliminating that variable.

2)We will examine various parameters of the marks, with emphasis on the difference between the boys schools and the girls schools. These include central tendencies, standard deviations, skewness etc. We will also examine the myth that girls are relatively stronger at humanities subjects while boys are stronger at science subjects. We may also examine if the population is heterogeneous (that is, whether 'good students' and 'bad students' form two distinctly distributed populations).

3)We will then analyse the data obtained from the co-educational schools and try to determine if the co-educational environment lessens the difference between boys and girls.

Now, we are all grossly inexperienced and this is the first time we will attempt a statistical study instead of working in out classrooms with provided data. I'm eagerly seeking suggestions from experienced persons about possible pitfalls and how to make our results statistically meaningful.

I'm also looking for specific help on the following topics:

i)What is a suitable measure of whether girls are stronger at some subjects while boys at other subjects? I'm thinking of comparing the percentage of total marks obtained in one subject, standardised against the whole population. For example, consider the variable X = percentage of total marks earned in History+Geography. Next, we define the standardised (wrt the entire population) variate corresponding to X, let it be Z. Now we compute the mean of Z over the girls schools (E_1(Z)) and the mean over the boys schools (E_2(Z)).

If the first value is larger than the second value (it seems one will have to be positive and the other negative), then we may say that girls prefer humanities more over other subjects than boys. Next we can do the same analysis on the boys vs girls population in coed schools and see if the difference is less. By using the absolute marks instead of expressing it as percentage of total marks, we can also compare the relative performance (as opposed to preference) of boys and girls in humanities. The same can be done for languages and sciences. Is this a statistically sound measure (unlikely, since I just made it up)? What are the alternatives?

ii) What is a good way of identifying whether the population in a school indeed consists of discreet stratas? This could be good students/bad students (there is indication from previous results that this may be the case) or in coed schools boys/girls (very likely the case). In case of coed schools, there may even be four stratas: good boys, good girls, bad boys, bad girls. It will be interesting to study whether bad boys vs girls show more difference than good boys vs girls. All this sounds very pretty, but I don't know how to separate the population into stratas.

iii)Is there some easily available (preferably free) software that will let me do all this analysis (brownie points for fitting probability distributions and graphing)? It would be a nightmare to do this by hand since we usually work with less than 50 data points instead of several hundred.

iv)As it stand right now, we will sample two boys schools, two girls schools and one coed school. Is this enough to be statistically significant? How many data points should we sample from each school? Should this be a constant or proportional to the total number of students?

v)Finally, is the whole proposition so glaringly ridiculous that all serious statisticians will simply laugh at it? I hope not :redface:

I hope you will help out. We have in all probability bitten off more than we can chew. But we are hoping to do some meaningful work publishable in a journal, so we need all the help we can get. I will also be very grateful if you give me the email of someone who may be able and willing to help. We will be marked for this in our school-finishing (and career determining) central exams, so this is very important to our whole class. Thanks a lot.

Last edited: