Incremental update to *existing* Logistic Regression model?

#1
Hi,

In our work, we have a large-scale logistic regression model we hate to train everyday. Is there a way we can somehow update the existing model with the new day's data?

Currently, we re-train the model daily with the past ninety-day (sliding window) data. And we use the L1-regularizer.

Using the existing model as a starting point, we could have incrementally updated it with new data, like some existing online training method. But how do we enforce the L1-regularizer?

I have been searching for a while on publication about this but no find so far. Can anyone shed some light on me?

Thanks much,
-Peter
 
#2
Also, an alternative could be training a small model entirely on a new day's data, and merge it to the old model. But how to merge? Any ideas?

-Peter
 
#3
It seems to me highly unlikely that this is possible. Even without a regularization constraint, logistic regression is a nonlinear optimization problem. Already this does not have an analytic solution, which is usually a prerequisite to deriving an update solution. With a regularization constraint, it becomes a constrained optimization problem. This introduces a whole new set of non-analytic complications on top of the ones that the uncontrained problem already had.

By the way, what's up with the machine learning people and their weird "regularization" constraints? I keep seeing it explained as a way to "avoid overfitting". Statisticians have this same problem, and have what appears to me to be a much cleaner, more analyticly justified approach. We choose to include or not include each new variable in the regression based on whether is contributes to the fit in a statistically significant way, as measured by a ratio test. By contrast, the "regularization" constraint approach includes all variables, but adds an arbitrary, ad-hoc constraint on their coefficient magnitudes. Is there some first-principles justification for this approach?
 
#4
By the way, what's up with the machine learning people and their weird "regularization" constraints? I keep seeing it explained as a way to "avoid overfitting". Statisticians have this same problem, and have what appears to me to be a much cleaner, more analyticly justified approach. We choose to include or not include each new variable in the regression based on whether is contributes to the fit in a statistically significant way, as measured by a ratio test. By contrast, the "regularization" constraint approach includes all variables, but adds an arbitrary, ad-hoc constraint on their coefficient magnitudes. Is there some first-principles justification for this approach?
Single feature ranking significant test fails to capture the binary (or n-ary) relationship between features like the textbook example of XOR. Some machine learning people are using regularization as their feature selection approach. My understanding is that ideally they want to use L0 regularizer, which constrains the number of non-zero weights. When this turns out NP-hard, they go for L1, which sort of approximates L0.

Andrew Ng has a paper "Feature selection, L1 vs. L2 regularization, and rotational invariance" (2005), arguing L1 behaves better than L2 dealing with irrelevant features and scales better too.
 
#5
Thanks for writing back to explain. The idea of L1 or L2 as an approximation to L0 is indeed illuminating, and makes the approach seem a bit less strange to me.

But only a bit. An L0 regularization would essentially fix the number of number of variables to include in the regression, yes? But why would the trainer think he knows a priori the right number of variables to use? Shouldn't he add as many variables as significantly improve the fit and leave out all the others? How would he know beforehand whether that number would be 5 or 25?

I will try to read the Ng paper you cite.
 
#7
Dear Peacherwoo,

This is an interesting discussion. May I ask if you have found a solution to this problem? I am also currently researching a related topic (if not the same), it would be nice if you could share with us why would you like to do something like this? That is, what is your application here.

Thanks a lot for sharing with us.