Interim Analysis

#1
I'm tasked with taking over a job from someone else. It seems to me this person's been running an interim analysis monthly. Consider the following (extremely simplified) scenario: We have two groups: treatment and control.
Month 1: the person ran a two sample t-test to compare an average
Month 2: the person ran a two sample t-test with updated treatment and control groups
Month 3: the person ran a two sample t-test with updated treatment and control groups
Month 4: etc.

I know there are several issues with this setup:

1.) The response variable (whatever it is) will be correlated from one month to the next.
2.) Glossing over the fact that there are potential repeated measures from month to month
3.) I highly doubt the variance can be pooled.
4.) I don't know where to begin thinking about type I, type II, and power amongst other things.
5.) Issues I haven't thought of yet.

I've been doing some reading and I think this is similar to group sequential testing. I have not done this analysis before. But, I understand it is used in clinical studies when the treatment and control group change over time. Can anyone help me understand this? Ultimately, I think this boils down updating results as more data comes in.
 
Last edited:
#2
Yeah, your intuition is right that there can be issues - especially "researcher degrees of freedom". A big question is why are monthly looks occurring and are the groups the same but the sample sizes are getting bigger as time elapses? Also, analyses aren't getting conducted to look adverse events to stop the study early or superiority of a group that may result in early stopping of the study, correct?

I will wait for these answers before chiming in with more questions and content.
 
#3
So, it's a proof of concept. It's a monthly report to monitor metric(s) (in this case averages) between a treatment and control. And I have the same concerns about researcher degrees of freedom. Especially, when you slice up a dataset in different ways. It seems to me at least that a lot of non-statisticians think statistical significance is a goal to be attained. I haven't looked at it too hard yet. I think the sample sizes are getting bigger over time in the hopes of "reaching statistical significance".
 
#4
So, it's a proof of concept. It's a monthly report to monitor metric(s) (in this case averages) between a treatment and control. And I have the same concerns about researcher degrees of freedom. Especially, when you slice up a dataset in different ways. It seems to me at least that a lot of non-statisticians think statistical significance is a goal to be attained. I haven't looked at it too hard yet. I think the sample sizes are getting bigger over time in the hopes of "reaching statistical significance".
A prevalent viewpoint I have encountered (often explicitly as well as implicitly while talking with people), and it becomes more dangerous (than it is) when coupled with a misunderstanding of p-values and multiplicity (especially without a priori subject matter expertise).
 
#5
A prevalent viewpoint I have encountered (often explicitly as well as implicitly while talking with people), and it becomes more dangerous (than it is) when coupled with a misunderstanding of p-values and multiplicity (especially without a priori subject matter expertise).
Yea. So, if I'm understanding this project clearly: new data is being collected each month. And I believe monthly data can be considered a random sample. But, these test statistics are generated based on the entirety of data (trailing 3 months, trailing 5 months, trailing 9 months, whatever the user chooses). And I believe this set up introduces bias and inflated type I errors. Is that fair to say?
 
#6
Yea. So, if I'm understanding this project clearly: new data is being collected each month. And I believe monthly data can be considered a random sample. But, these test statistics are generated based on the entirety of data (trailing 3 months, trailing 5 months, trailing 9 months, whatever the user chooses). And I believe this set up introduces bias and inflated type I errors. Is that fair to say?
I think so which is where the issue of multiplicity arises in Frequentist approaches (or even people who literally keep collecting data to achieve a low p-value). Frank Harrell mentions that this is an advantage of Bayesian approaches because it's just updating without the multiplicity issue. The issue is that the data are being pooled as you said; your skills will be better than mine and maybe asking a professor in your program would help since it seems tricky, but if there is bias you can calculate it (or simulate it).

I skimmed this and it seems a bit relevant: https://sites.psu.edu/zitaoravecz/files/2016/09/Oravecz2016SBUFB-19tbvpm.pdf

Maybe @Dason or @hlsmith can comment more on the Bayesian approach as I'm not well versed at all in that. Also not very familiar with sequential monitoring even on a Frequentist basis (now I will look for papers!).