# How do I determine the minimum number of observations needed per category in categorical predictor variables?

#### drhb

##### New Member
When conducting a simple linear regression analysis with one categorical predictor variable and power projected at .80, G*Power suggests a minimum sample size of n = 55. Now, assume that the predictor has three categories, and that I obtain 55 necessary responses. What does it mean for the regression analysis if the number of observations in each category is distributed as something like 5, 10, and 40? That is, 5 observations in category #1, 10 in cat #2 and 40 in cat #3. Or consider a binary predictor variable with 5 in cat #1 and 50 in cat #2? I have achieved the minimum sample size projection, but do I have enuf data across the categories? Is there a minimum number per category that I should be looking for?

Thanx!
drhsb

#### hlsmith

##### Less is more. Stay pure. Stay poor.
The thing to think about is what the mean and SE for the outcome would look like for the group of ten obs. Likely very wide and imprecise, making comparisons between groups difficult.

What is your hypothesis/question for the study?

#### drhb

##### New Member
Thanx!

There is no mean or SE for a categorical variable. Here are the hypotheses.

H0: The predictor variable is not a statistically significant predictor of the outcome.
H1: The predictor variable is a statistically significant predictor of the outcome.

So, does the category predict the outcome?

hb

#### hlsmith

##### Less is more. Stay pure. Stay poor.
You said, "simple linear regression", so you will have a mean estimate for each group in regards to the dependent variable, correct? A sample of ten would likely be imprecise.

So when you run the model you will have the intercept representing the reference categorical variable and the other two variables in the output representing the increased/decreased values for the outcome. All formatted as the estimated mean value of the dependent value.

I brought up what is the H0, since that may dictate or inform they assumptions for sample size calculation. I would just do a few simulations to see how the samples sizes would play out. So you are assuming all three groups would be significantly different then a null?

#### drhb

##### New Member
I'm not interested in the groups, just the variable as a whole. For example, perhaps my predictor is "size" and my categories are "small," "medium," and "large." I just want to know if "size" predicts some outcome variable. If a priori analysis tells me I need 100 observations, is 100 total sufficient or do I need 100 for each category? This is what I am trying to determine.