# Two Variables, Many Components :( Type 1 Error?

#### Elle_Belle

##### New Member
Hey,
Have spent ages trawling through the threads before I decide I would have to bother everyone with yet another question from a dissertation student :$I can’t seem to find any similar problems and I have been in and out of books for days so any help on this would be _greatly_ appreciated. I am a third year degree student studying towards a BSc (hons) in Equine Science. Currently trying to do the statistics for my undergraduate dissertation and am in dire need of a bit of advice (it’s due in a week :S !)! My dissertation is looking at the effect of racing career on stud career in thoroughbred flat racing, concerning in particular first and second placed horses which competed in the Epsom Derby from 1993 to the present day. The analysis I am trying to do basically splits into two parts; the first concerns comparing the racing careers of horses that came first in the Derby with the horses that came second. The second compares the stud careers of the horses which came first in the Derby with the horses that came second. The hypothesis being that horses which came first in the Derby will have more successful stud careers than the horses that came second regardless of racing performance. The problem that I am having is that is that there are many components which contribute to the to the racing and stud careers, for example; in racing: - highest Timeform rating - Timeform rating after Derby - sire’s highest Timeform rating - damsire’s highest Timeform rating - percentage of Group One starts won (excluding the Derby) - percentage of starts which were Group One races - career prize money earned - Racing Performence rating (an index) at stud: - stud fee - number of mares covered - percentage of yearlings offered which sold - deviation of stallions average yearling price from the average market yearling - deviation of highest selling yearling price from the average yearling price that year Because I could not give these components a suitable weighting in order to come up with some sort of rating number for each horse I decided to analyse each component individually, meaning I carried out normality tests upon each component (all were significantly different from normal distribution) before the first and second placed horses in each. For example, I carried out a Mann-Whitney U test (using SPSS) upon the highest Timeform rating component, comparing the first and second placed horses. This concluded that there was no significant difference between the highest Timeform ratings of the horses which came first and the highest Timeform ratings the horses which came second. I then carried out a Mann-Whitney U test (using SPSS) upon the Timeform rating of the horses’ after the race. This concluded that there was no significant difference between the Timeform rating after the race of the horses which came first and Timeform rating after the race of the horses which came second. This testing was carried on throughout each racing career component. All the results concluded there was no significant (p>0.05) difference in the racing careers of the first and second placed horses, implying that the horses were equally as successful, therefore the purpose of testing the stud careers would be to determine whether winning the Derby means the horse has a better stud career as he won a famous race, rather than because he was an overall better horse. This would lead on to me testing the stud careers of the horses in the same way, to conclude either; - the stud career of the horses which won the Derby were significantly better than the horses which came second (if there was a significant difference in the majority of the stud career components) - the stud career of the horses which won the Derby were not significantly better than the horses which came second (if there was no significant difference in the majority of the stud career components). However; I have been told that testing the data in this way would lead to me having a type one error? And that this jeopardises the validity of the results Obviously the ideal way for me to analyse the data would be to have one number for racing career and one number for stud career in each horse. Then determine if there was a significant difference in the racing career numbers between the first and second placed horses before analysing whether there was a significant difference in the stud career numbers between the first and second placed horses. However as some of the components in the variables are more ‘important’ than others (e.g. Highest Timeform rating is more important than Sire’s highest Timeform rating) I do not know how to consolidate the data down into one component :$

Anyone have any advice? Other than chose an easier degree? Are there any tests which let me analyse my data as many components within two variables?

Any advice at all would be greatly appreciated, apologies for posting an absolute essay but it’s quite a difficult thing to explain!

Many many thanks,
Elle