# Comparative test to use

#### LZC

##### New Member
Hello!

I am trying to see if there is a difference in helmet use between 3 different groups of bicyclists (riding different vehicle types). All variables are categorical with a binary (0/1) outcome. Struggling to figure out which test to use given they are all categorical.

Thanks!

#### staassis

##### Active Member
How big are your data? How many bicyclists per each category?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Given your answers to @staassis - you may be looking at multinomial logistic regression or ordinal logistic regression.

#### LZC

##### New Member
The sample sizes are unequal - it's 3350, 920 and 202 for each category. Helmet use is coded 0/1 for each rider. They're all different riders so independent samples.

I basically want to know if there's a significant difference in helmet use between riders of each vehicle type, and then also within each group. I thought about doing a ttest for the latter for each group but that just increases the error. Maybe an anova for the former? I guess the question is do I really need to do a comparative test or is a regression more informative?

Thank you!!

#### staassis

##### Active Member
The sample sizes are unequal - it's 3350, 920 and 202 for each category. Helmet use is coded 0/1 for each rider.
Thank you. And in each category, how many people use helmets?

#### staassis

##### Active Member
To see if there is a relationship between bicycle type and helmet usage, you can run chi-square test for independence... You can also follow @hlsmith's suggestion and run logistic regression of the form

Helmet Usage ~ Bicycle Type A + Bicycle Type B,

where 1) one of the bicycle types is left out as the reference category,
2) Bicycle Type A and Bicycle Type B are binary dummy variables.

The omnibus likelihood-ratio test for the estimated logit model will answer the same question as the aforementioned chi-square test (about the overall relationship)... Next, polish the logit model by keeping only statistically significant terms. Once you are done, the model will show you the direction of the effects. You will see which group is more helmet-friendly than another and by how much.

#### LZC

##### New Member
To see if there is a relationship between bicycle type and helmet usage, you can run chi-square test for independence... You can also follow @hlsmith's suggestion and run logistic regression of the form

Helmet Usage ~ Bicycle Type A + Bicycle Type B,

where 1) one of the bicycle types is left out as the reference category,
2) Bicycle Type A and Bicycle Type B are binary dummy variables.

The omnibus likelihood-ratio test for the estimated logit model will answer the same question as the aforementioned chi-square test (about the overall relationship)... Next, polish the logit model by keeping only statistically significant terms. Once you are done, the model will show you the direction of the effects. You will see which group is more helmet-friendly than another and by how much.
Very helpful, thank you so much!!

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Was assignment into the three groups randomized? If not, that is one of the benefit of a regression, you can control for back group differences between the groups that may also impact the outcome. If you have say age discrepancies between groups for example, that could affect the group self-enrollment and the outcome, and not controlling for it results in a type of residual bias.

Also, many people strongly encourage controlling for family-wise error when making multiple comparisons in order to lessen the risk for type I errors (rejecting the null when it is true). So if you are planning on comparing the differences in outcome via multiple group comparisons, it may be prudent to investigation the correction.

Thanks and welcome to the forum.