Stats Project - Navigating and using Stata.

#1
I am currently attempting a statistics project, using stata, my dataset has four variables and I have the following brief. The data can be seen attached.

I have posted the questions, and in brackets said how I think I may go about it. I know this is a lengthy post so sorry for that but any hep would be much appreciated.

The variables are the end-of-week share price for a particular stock (y), the end-of-week value for the Standard & Poor 500 index (sap), a binary variable (‘time_dum’) equal to 1 if the data relate to the last half of observations in the series and zero otherwise, and a variable (newt) that denotes the frequency of the data.

I have to;

1. Declare your data as time series.
2. Generate the log of the variables y (‘ly’) and sap (‘lsap’).
3. Generate the log returns of the variables y (‘dly’) and sap (‘dlsap’).

(I have completed steps 1-3)

4. Setting out the issues clearly and use descriptive techniques (mean, standard deviation, median, percentiles) and graphs (boxplot, kernel plot, P-P plot, time series plot) to describe the main properties of the two log return series

(Do I just use sum dlsap and dly)?

5. Conduct a series of normality tests for both these log return series, taking into consideration the possible effect of outliers analysed in the previous point.

(swilk test and sktest?) effect of outliers etc


6. Test whether there are statistically significant mean differences in the log returns between the y stock and the S&P.

(T-test? Not sure about how to do this one) Mann-whitney U since data is probably not normally distributed?)

7. Test whether the log returns for the stock series are more volatile than the S&P 500 series.

(sd test?)

8. Test whether there are statistically significant mean differences in the log returns for the y stock between the first half and the second-half of the sample.

(No idea)

9. Test whether the log returns for the stock series are more volatile than the S&P 500 series between the first half and the second-half of the sample.


10. Test whether there are statistically significant median differences in the log returns for the y stock between the first half and the second-half of the sample.
11. Test whether the time series sequence of log returns for the y stock are random or not.
12. Estimate a simple Capital Asset Pricing Model (CAPM). This involves the Ordinary Least Squares Estimation (OLS) of the following regression model:
dlyt = (alpha) + (beta)dlsapt + ut
13. Evaluate and interpret the estimates of this regression model.
14. Comment of the overall goodness of the model.
15. Test a relevant proposition regarding the  value. Interpret the result.

(Run a regression and test dlsap =1 to see if defensive/offensive stock)

16. Estimate the model under the assumption that the intercept is zero. Does the estimate of the stock's beta change much?

(run regression with no constant and see if beta stat is different)

17. Is the beta coefficient stable over time? Estimate a separate model, with and without intercept, for each sub-sample period. How do you interpret the results?

18. Inspect graphically whether the residuals of the regression model satisfy the properties of the residuals.

19. Formally test for properties of the residuals.