Simple Linear Regression Qs

Hi there, I am practicing exam questions where we are presented with minitab outputs and have to answer questions. I have attempted all the questions below but I am unsure if all of them are correct. Thanks in advance to whoever checks out my answers!

Sixteen male well-trained middle and long distance runners performed a 3 km time trial and a number of running tests in the laboratory where their v-4mM blood lactate marker was recorded (i.e. the running velocity achieved at a blood lactate concentration of 4mmol-1).

All the laboratory testing took place on a motorised treadmill while distance running performance was determined by 3 km time trials on an indoor 200m track.

The aim of the study was to investigate whether there is sufficient evidence of a dependency of 3 km running time on v-4mM in the population of male runners of interest in order to use their blood lactate markers to predict their 3km running time. A scatterplot (with line of best fit) is provided, as is output from a regression analysis carried out on these data.

i) What are the slope and intercept of the least-squares line?

Slope = -0.3729

Intercept = 15.822

ii) What is the correlation between v-4mM and 3km finishing time?

As v-4mM increases, 3km finishing time decreases.

iii) Based on the p-values presented, explain why there is evidence that v4mM is a significant predictor of 3km finishing?

As the p-values presented are less than 0.05, this means they are significant and we can reject the null hypothesis and conclude that v4mM is a significant predictor of 3km finishing time.

iv) Provide an interpretation of the R-sq statistic in terms of how useful is v4mM as a predictor of 3km finishing time.

The R-sq statistic measures the percentage of variability on the response variable that is explained by the regression. Here, the R-sq statistic is 85.72% which means over 85% of the variability in 3km time is explained by v4mM.

v) Provide an interpretation of the S statistic (highlighted in bold) in terms of using this model to predict 3km finishing time.

The S statistic is the estimated standard deviation about the true regression line. This means that whatever the value of v-4mM, then the 3km time is likely to lie plus or minus twice the S statistic value about the fitted value of the 3km time of 15.822 – 0.3729 times the particular v-4mM.

vi) A particular athlete recorded a v-4mM of 17 in the lab prior to a 3km event. Use the output below to provide a range of predicted values for his likely finishing time.

15.822 – 0.3729(17) = 9.4827 mins

9.4827 +/- 2(0.291111) = 10.06, 8.9

vii) What are the assumptions underlying the model presented and do they look justified based on the residual plots provided?

  • The sample is representative of the population of interest and the subjects are independent. Independence is valid here as the response variable was measured separately for each subject.
  • The relationship between the mean response and the explanatory variable is linear in the population (Linearity assumption). This assumption can be checked by looking at the scatter plot and is valid because the overall pattern of the response across the different values of the explanatory resemble a linear pattern.
  • The response exhibits variability about the population regression line in the shape of a Normal distribution (Normality assumption); and the standard deviation of the response is the same for any given value of the explanatory variable (Equal Spreads assumption). These assumptions are plausible as suitable residual plots are used. A plot of the standardised residuals (on the vertical axis) against the fitted values from the regression. If the Linearity and Equal Spreads assumptions are valid, this plot should show a random scatter of points;