Understanding how these regression results are reported..

BeetRoot

New Member
I can't work out what is being reported here, this is from a backward stepwise regression. What's the difference between 'showing a significant contribution to the model' and 'making a signficant contribution in accounting for the error rates' and why are the two variables reported together for the second? I'm including this paper in a review and I think that the significance of results is being overstated in some places but I'm not sure if that's just because I'm not understanding what was done. I don't know what to say about this in particular. Very grateful for any thoughts or further reading recommendations.

This is the paper - https://pubmed.ncbi.nlm.nih.gov/12096871/

This is the bit I'm most confused by - “Both the remaining variables—log frequency and log density— showed significant contributions to the model p <.001, so no further steps were taken. In the resulting model, the two variables together made a significant contribution in accounting for the error rates (F(2172) =29:72, p <:001)

Whole paragraph for context:

"The aim of the regression analysis was to reveal which of these variables appear to be the most important predictors of naming accuracy. Because of the missing neighborhood frequency values for zero-density items, this variable was excluded so that the regression analysis could be conducted on the full set of PNT stimuli, but the correlations described above suggest that the exclusion of neighborhood frequency does not sacrifice much predictive power from the regression model. Thus, the log- transformed error rates were regressed on the log values of item frequency and neighborhood density and on the number of syllables of each item. In the original model, the three variables together accounted for 26.0% of the variance in error rates (R = .510). In the next step, the variable contributing the least amount to the model—number of syllables—was removed, with negligible change in the overall R (from .510to .507). Both the remaining variables—log frequency and log density— showed significant contributions to the model (p <.001), so no further steps were taken. In the resulting model, the two variables together made a significant contribution in accounting for the error rates (F(2172) = 29:72, p <:001); however, it should be noted that they still accounted for only about 26% of the variance in PNT accuracy. It seems logical that this reflects the influence of other factors on naming accuracy, since the error rate used in the analysis includes other types of errors, such as semantic errors, descriptions, and no responses."

noetsi

Fortran must die
First off, having not read the whole article, but being painful acquainted with the issue of what I call relative impact, I don't think regression is designed to show "what are the most important predictors...." P values don't show this nor do slopes unless everything is on the same scale (maybe not then). It is not easy to say from regression, this variable has more impact than this. You can say this variable likely did or did not have an impact, only. And stepwise regression is full of doubtful assumptions. Which variables you add to the model in what order make a huge difference. They appear to be assessing the impact of variables by how much it increased R square rather than a formal test (so they were making a judgement on whether a given increase was significant, rather than making a formal test of that).

I think error rate here is the dependent variable, what they are predicting which is a bit confusing. When they say "Both the remaining variables—log frequency and log density— showed significant contributions to the model (p <.001), so no further steps were taken" I think they are talking about either the stepwise test or the t test of the slope. This is a formal test of statistical significance. Personally I don't agree with stepwise and would therefore disagree there is any statistical test that shows a variable makes a significant contribution. I don't feel a t test does this either. To make this decision you have to decide if the effect size is large enough to matter - which is a judgement not a statistical test.