I'm taking a graduate-level statistics course at the University of Glasgow which includes a methods assignment. The assignment requires us to correctly execute several statistics, including one and two sample t tests.

To complete my assignment, I'm using a large dataset of bike share data that spans 25 months, (discrete hires, n= 221,484).

To check the two sample t-test, I have an idea to test for a significant difference between two means (obviously) in annual bike hire data. The goals is to test the null hypothesis, “there is no difference in the average number of bike hires between the first and second year”

The objectives are, to

1. count the number of hires per day for the entire study period (frequency) and

2. cut the first and second years out to create two new datasets. In which case n=365 for two both samples new samples.

3. Execute (in R), t.test(YearOne$Freq, YearTwo$Freq, paired = FALSE, conf.level = 0.9999)

However, I’m confused about the eligibility of this data with a t-test. Becasue..

1. N>30, and

2. I can calculate the standard deviation (SD) for both datasets.

Why I’m confused, is because my book (Andy Fields: Discovering Statistics Using R) allows a degree of freedom over 30 (actually, up to 100, then infinity). Furthermore, the standard deviation of the samples can be calculated, but I may not calculate the SD for the overall programmes lifecycle.

Any thoughts? May I use an independent t-test to test the null hypothesis?

This may broadly be a misundersatnding of what is a t-test, and what is sample..

Thank you all in advance.