# categorical variables in regression analysis and interaction terms

#### Clarkson

##### New Member
I am learning multiple regression with categorical variables and in a book I came across this problem.

For yield of corn suppose there are two factors affecting, nitrogen level and depth of ploughing. Say there are three nitrogen categories(1,2,3) and two depth categories(1,2).

There's a interaction term of nitrogen*depth as well.

If the introduced dummy variables are as
E_{i1}=1 if ith observation has nitrogen level 1 ,0 otherwise.
E_{i2}=1 if ith observation has nitrogen level 2 ,0 otherwise.

And D=1 if depth category is 1,0 otherwise .

Then the model would be
Y=beta_0+beta_1E_1+beta_2E_2+beta_3D+beta_4[E_1.D]+beta_5[E_2.D]+epsilon
Is this correct?

When the t-statistics are found for the category [nitrogen=1*depth=1] significance value was 0.029 and for [nitrogen=2*depth=1] it was 0.290.
I was asked to interpret if the interaction term significantly affects the yield of corn.

Under 5% confidence level clearly the coefficient of the variable [nitrogen=2*depth=1] is not significant. But since the coefficient of the variable [nitrogen=1*depth=1] is significant can I say that, the interaction term significantly affects the yield of corn.
If both had insignificant coefficients then I could have said that there isn't a significant effect from the interaction term right?

Since the interaction term $[nitrogen=2*depth=1]$ had a insignificant coefficient then my fitted model would be Y=beta_0+beta_1E_1+beta_2E_2+beta_3D+beta_4[E_1.D]

leaving out E2D.IS this correct?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
When examining interactions in a model, you solely look at the interaction term with its product components (main effects) left in the model. If the interaction term is significate you would want to keep it in the model along with it components' main effects. It is standard process to then ignore the interpretation of the main effects even though they are in the model.

It may then be best to plot or examine the interactions by stratifying data and examing differences at the various combinations of the categorical levels for terms in the interaction.