Misspecified linear model, interpretation of slope

fed2

Active Member
#1
Suppose you fit a simple linear regression when the data actual follow some sort of sigmoid, 4 parm logisitc or similar.

To what extent is the following statement true/false:
"The estimated slope of the simple linear regression is the average slope of the sigmoid"


Seems intuitively true doesn't it?
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
My brain can't even process it. It feels latent and cloaked issues, but maybe not. I picture the sigmoid shape and a single line going through it. So what does "average slope of sigmoid" even mean, if you weighted piecewise segments and created a single number? Will if linear regression can be used to fit binary data, I suppose it could be true, maybe.
 

fed2

Active Member
#3
I guess the question is pretty vague. I got to thinking about it though and I realize that if you have a nonlinear function g(x) and consider the expectation of the usual linear regression slope betahat = Sxy/Sxx and you replace Y with its expected value g(x), and approximate g(x) with first order taylor, then you end up with something like
E(betahat) = slope of g at mean X + bias term

where the bias term is the slope of the linear regression relating the Remainder term in the taylor approx to X.

I swear i've seen this sort of theory before somewhere, but it is hard to google up again.

Maybe some R code explains better

C-like:
#logistic fun
g = function(x){
  1/( 1 + exp(-x) )
}
#first derivative
g_dot = function(x){
  g(x)*(1 - g(x) )
}

#first order approximation
tangent = function(x,a){
    g(a) + g_dot(a)*(x - a)
}

#fit slope to logistic experiment;
runSim <- function(j){
    X <- runif(10,-1,1)
    gX = g(X) + rnorm(10,0,.1)
    betaHat = coef( lm(gX ~ X) )[[2]];  #Fit linear regression to logistic;
    #the bias is slope of SLR relating errs to X;
    errs =  tangent(X,0) - gX; 
    bias = coef( lm(errs ~ X) )[[2]];
    data.frame( betaHat=betaHat, bias = bias)
  }


mySims =  do.call( 'rbind', lapply(1:1000, runSim)  )

print(  sprintf( 'expected slope at mean X = %f', g_dot(0) )  )

print(   sprintf( 'slope of slr %f', mean(mySims$betaHat  )  )  )

print(   sprintf( 'predicted bias %f', mean(mySims$bias  )  )  )

#so fitting linear reg gives g_dot(0) + a bias related
#to the erro of first order taylor approx?
 

hlsmith

Less is more. Stay pure. Stay poor.
#4
Yes, i also wondered how this may play out in a sim. I look forward to reviewing your code when iget back to the office on Monday.