Date variables in predictive modeling.

#1
This is more of a hypothetical question as I haven't gathered the data yet. But, suppose I have monthly data over the last two years and I build a predictive model using the month and year variables. Am I automatically introducing a time component and thus should use some ARIMA-type model? Or can I think of the month/year as simply categorical? As an end product, I don't really care to plug in values for past months/years. So, I guess the problem isn't about forecasting. But, it's important that the model generalizes well with the other variables in the model (obviously). I'm thinking of excluding the date variables altogether as the response variable (repair or no repair) is relatively stable over time. I plan to use a tree model of sorts to capture the nonlinearity in my data. Thanks
 

noetsi

Fortran must die
#2
The answer depends on whether

1) Your data is stationary or not.
2) You have autocorrelation or not
3) You are predicting with lags (such as t1 predicting t2).

ARIMA does not work well with multiple variables IMHO. You can do it but due to the need to pre-whiten it is painful. I would use another type of time series.