Fundamentally, you get different answers because the whole stepwise regression procedure is not strictly justifiable, whether forward or backward.

More directly, what is going on is usually a multi-colinearly effect. Suppose we have three factors, which we'll call A, B, and C. A and C strongly affect your outcome. B doesn't directly affect your outcome, but its occurance is strongly correlated with C, so if you don't control for C, it will look like B strongly affects your outcome. Now imagine that you collect a data set that does not have enough instances with differing values of B and C to allow you to see the effect of one controlling for the other.

Suppose we do a forward stepwise regression. If we do this by testing the significance of each factor individually and then adding in all that pass the test, our final model will be (A,B,C) because each affects the outcome when tested individually. If we do this by testing and immediately adding factors that significantly improve the fit, our final model will be either (A,B) or (A,C), depending on whether we tested B or C first, because once you have one, adding the other doesn't significantly improve the fit.

Now suppose we do a backward stepwise regression. We do a fit with all three factors and look at which coefficients are significantly non-zero. But since our data do not allow us to distinguish B from C, the joint confidence region will include a zero B coefficient (with a strong covariance indicating that the C coefficient is non-zero in this case) and a zero C coefficient (with a strong covariance indicating that the B coefficient is non-zero in this case). Since the confidence regions for both coefficients include zero (and the stepwise procedure ignores covariances), we will throw out both B and C and end up with the model (A).

I hope that illustrates the vagaries of the stepwise procedures. As to which is best? Well, not throwing away any factors, regardless of how insignificant their coefficients, will give you the best fit. It won't necessarily give you the best P-value, though, since the P-value depends not only on the residual but also on the number of parameters you used to get it. Neither the forward nor the backward procedures is guaranteed to give you the best P -- one of them might, or the factor set that gives you the best P might not be discovered by either. But you are not supposed to do statistics by jiggering your analysis to get the best P -- that's a sure way to get a false positive. What you should really do is decide which factors you will consider in your regression *before* you analyze your data. If exploratory analysis indicates that you should have included one you left out, then you start over and collect new data.