Phonetics: Interactions without one of the main effects

#1
I am using a logistic regression model with bias reduction in order to predict the contrast between /b/ and /p/ in Ojibwe based on acoustic correlates (ConsDur, Sonority etc...). I am trying to arrive at the best fitted model according to the AIC.

In R I can construct a model with an interaction term without one of the main effects in the interaction, such as.

brglm(tense~ConsDur+ConsDur:Sonority, data=..., family=binomial)

This model has the lowest AIC, but I do not know how to interpret the interaction term. In terms of my theory this makes a lot of sense, consonant duration predicts the /b/-/p/contrast and is improved when we take into account that consonant duration varies according to voicing. Am I aloud to construct a model like this? If not, what is R telling me, when I construct this model (i.e. why is it possible in R)?

thanks,

Adam
 

Mean Joe

TS Contributor
#2
This model has the lowest AIC, but I do not know how to interpret the interaction term. In terms of my theory this makes a lot of sense, consonant duration predicts the /b/-/p/contrast and is improved when we take into account that consonant duration varies according to voicing.
It sounds to me like you do have the interpretation of the interaction term down. You would include the interaction term if you think the effect of ConsDur depends on the level/value of Sonority. (If the interaction effect is positive) eg the effect of ConsDur is greater when Sonority=2 than when Sonority=1. The interaction term is basically a new variable that takes the mathematical value = ConsDur * Sonority.

Am I aloud to construct a model like this? If not, what is R telling me, when I construct this model (i.e. why is it possible in R)?
You should include the main effect for Sonority too. People will wonder what changes in effect sizes would happen if you included it. I wouldn't leave it out just because AIC is better without it.
 
#3
Thanks very much,

Yeah, I thought of that, but I am afraid of overfitting. When I said "interpret the interaction", I meant that I would not know how to calculate the predicted probability based on a particular value of consonant duration in this model because the interaction term needs a coefficient from consonant duration and sonority (interaction coefficient*ConsDur coef*Sonority coef). Is there a way of calculating this. The second best model is just with ConsDur when ranked in terms of the lowest AIC. So here's the models I'm comparing (trying to use the multimodel inference in Anderson 2008, and Anderson & Burnham 2002 as much as I can) ranked in terms of the lowest AIC.

1. brglm(tense~ConsDur+ConsDur:Sonority,...)
2. brglm(tense~ConsDur,....)
3. brglm(tense~ConsDur+Sonority+ConsDur:Sonority,...)

If I accept a higher AIC (option 3), there's no reason why I cannot overfit the data by just continuously adding factors, but the iteraction is clearly important. Unfortunately this package does not seem to have any other information-theoretic criterion to justify what I'm doing. Let me rephrase my question: I would like to know what R is telling me exactly with the following output.

Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.87321 1.90100 -2.564 0.010362 *
ConsDur 0.08247 0.02309 3.572 0.000355 ***
ConsDur:Sonority -0.05816 0.01015 -5.730 1.00e-08 ***

Where does the coefficient for ConsDur:Sonority even come from? Is it some sort of data manipulation of ConsDur? If so? How can I get a predicted probability from these values?

p.s. I'm not surprised it Sonority has very little effect and is not sig. when the interaction is added. Sonority is a measure of voicing which is related to consonant duration for inertial motor reasons. In some languages voicing is contrastive by itself but I have reason to believe this is not the case for Ojibwe.

Adam
 
#4
oh yeah....
Anderson, D. R. 2008. A Model Based Inference in the Life Sciences: A Primer on Evidence. Springer.
Burnham, K. P. & Anderson, D. R. 2002. Model selection and multimodel inference: A practical information-theoretic approach. Springer.
 

Mean Joe

TS Contributor
#5
Let me rephrase my question: I would like to know what R is telling me exactly with the following output.

Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.87321 1.90100 -2.564 0.010362 *
ConsDur 0.08247 0.02309 3.572 0.000355 ***
ConsDur:Sonority -0.05816 0.01015 -5.730 1.00e-08 ***

Where does the coefficient for ConsDur:Sonority even come from? Is it some sort of data manipulation of ConsDur? If so? How can I get a predicted probability from these values?
Understandable; I personally am not comfortable in using interactions unless one of the variables is dichotomous (0/1); for me the interpretation/assumption gets a little tricky. The coefficient for ConsDur:Sonority comes out the same way that the coefficient for ConsDur comes out--you can see that by doing this: in your data set make a new variable, call it inter = ConsDur x Sonority. Then run your model adjusting for ConsDur, inter. You should see the same result for inter as you did for ConsDur:Sonority. At it's base the interaction is just another covariate, albeit one whose values are determined entirely by two other covariates; it is not a covariate that you can change by itself.
The thing that is tricky to me, is that the interaction value is the same when ConsDur=0.1 and Sonority = 30, as when ConsDur=3 and Sonority=1. When one of the variables is dichotomous you do not need to worry about this.

When I said "interpret the interaction", I meant that I would not know how to calculate the predicted probability based on a particular value of consonant duration in this model because the interaction term needs a coefficient from consonant duration and sonority (interaction coefficient*ConsDur coef*Sonority coef).
You need both particular values of consonant duration and sonority. You can't calculate a predicted probability if you only input a particular value for consonant duration. Because the predicted probability varies according to the level of sonority too.

Working through an example calculation with your output: If ConsDur=0.1 and Sonority=30 (ConsDur*Sonority=3), then a +1 increase in ConsDor to =1.1 and -26.4 decrease in Sonority to =3.6 (leads to a +0.96 increase in ConsDur*Sonority to =3.96), then the "value that leads to predicted probability" changes by 1*0.082 + 0.96*(-0.058) = +0.026. I will just repeat here that it is tricky to me, when one of the variables is not dichotomous. A +1 increase in ConsDur and +1 increase in Sonority, the end effect depends on what levels of ConsDur and Sonority you started with, because the interaction term will change by varying amounts.


Yeah, I thought of that, but I am afraid of overfitting. The second best model is just with ConsDur when ranked in terms of the lowest AIC. p.s. I'm not surprised it Sonority has very little effect and is not sig. when the interaction is added. Sonority is a measure of voicing which is related to consonant duration for inertial motor reasons. In some languages voicing is contrastive by itself but I have reason to believe this is not the case for Ojibwe.
Right, adding Sonority to the model may be only adding a statistically non-significant term. But people like to at least see that you considered it as well. It will give a fuller picture, show that you are not trying to obscure something to get a certain result. Maybe you could present two models, then point out that the one with Sonority is really rubbish, so you're just going to go forward with predicted probabilities based on the model without Sonority?