In our models, the dependent variable is audit fees paid to auditor by audit client, dollars per year (continuous variable). We use regression to analyze the data.

Our issue is the measure of a predictor variable, restatement by client company (correction of reporting error from prior year). Prior studies generally use one of two measures listed below:

*Restatement (binary).*Did company restate financial restatement? Yes/no, 1 or 0

*Restatement (ratio -- restructuring dollar amount/ income of company in dollars).*If company restates the financial statement, what is the relative effect on income? If company does not restate, then the ratio is zero. In our sample, about 70% of sample do not restate so those observations have zero ratio value.

We need to choose between three models:

*LnAFt,*Natural log of audit fees

*LnAFt*= α0 + α1**Restatement (binary--yes/no)*+ various control variables*LnAFt*= α0 + α1**Restatement (ratio)*+ various control variables*LnAFt*= α0 + α1**Restatement (binary--yes/no)*+ α2**Restatement (ratio)*+ various control variables

Which model do you recommend? Or is there another way to model the restatement effect?

For model (2), using regression, would this model yield a biased estimate of α1 (Restatement ratio) because most of sample do not have restatement and therefore have zero values for ratio. According to our theoretical arguments, a restating company with a very small ratio would have significantly higher fees than a company with no restatement. (The error may have been relatively small in dollars, but now the integrity of the financial accounting system is in question.)

Or do we use first model above, then split sample by restatement (yes/no)? For those companies that restate, run model (2). Preliminary analysis with our sample suggests that, for this group, the higher the ratio, the larger the LnAF.

Any insight into modelling issue would be appreciated.