# Request for advice: sample size for 3 arm trial with binary outcome

#### ndphysio

##### New Member
Hello

I have binary outcome (success & failure). Can someone advise me how can I compute sample size for 3 arm trial comparing 3 interventions. Success will be determined as complete relief of symptoms and percentage of patients will be computed. In previous literature, I can find studies comparing intervention A with B and found that 64% patients had complete relief with intervention A and 31 % had complete relief with intervention B but I could not find studies evaluating success rate with A+B. I also have one additional group (combined A+B). So, can you suggest me with the command which I can use to compute sample size for my 3 arm trial in stata 15 or is there any other software which I could use.

I have tried g power, but I am unsure what should I use for following in g power:
Corr among rep measures = ?
Nonsphericity correction ε = ?

I will be thankful if you could advise me value which can be used for above 2 measures?

Other values which I could impute in g power are:
F tests - ANOVA: Repeated measures, within-between interaction
Analysis: A priori: Compute required sample size
Input: Effect size f = 0.5
α err prob = 0.05
Power (1-β err prob) = 0.80
Number of groups = 3
Repetitions = 4
Corr among rep measures =
Nonsphericity correction ε =

Thank you so much for your time.
Regards
Neha

#### hlsmith

##### Not a robit
Is intervention group randomized?

So you have groups A, B, C, but also want to collapse A and B into one group in the analyses?

So you want to compare A v B, A v C, B v C, and A + B v C, correct? I would just find the comparison with the smallest effect size, and do a power calculation for that comparison, since it will be your limiting factor in establishing power.

Also, if you are making multiple comparisons you will need to correct your alpha value for risk of false discover, e.g., 0.05/4 = alpha = 0.0125.

If you are having trouble trying to do the power calculation, you can always just simulate the two intervention groups based on prior info and see what sample size you need to get alpha=0.0125 and beta = ?0.8 or higher.

#### ndphysio

##### New Member
Hello Hlsmith
I should first correct my statement: I have 3 groups
Grp1: Intervention A,
Grp2: Intervention B,
Grp 3: Intervention A+B)
So, I will be comparing grp1 vs grp2, grp 1 vs grp 3, grp 2 vs grp3
I will be assessing outcome at baseline, 6wk, 3m, 6m and 1yr FU. So, can you please suggest me with the command which I can use to compute sample size for my 3 arm trial in stata 15 or is there any other free software which I could use.
Or can you please advise me what could be following if I want to use G power for sample size computation:
Corr among rep measures = ?
Nonsphericity correction ε = ?
Thank you so much again
Regards
Neha

#### hlsmith

##### Not a robit
What analysis do you plan to conduct, repeat measures, survival analysis, etc.?

#### ndphysio

##### New Member
Repeated measures/ Anova
Thank you

#### ndphysio

##### New Member
I am supposed to know this sample size by tonight. I will be greatly thankful if you kindly help me. Thank you.

#### EdGr

##### New Member
Repeated measures ANOVA is designed for numeric outcomes. You have a binary outcome. I would probably err on the simple side and do a chi-square test at each time point. For that you just need to estimate the 3 percentages of success. For power I would pick the time point of maximum interest, and 3 success percentages that represent the smallest theoretically important differences. So 64% with A, 31% with B, maybe 75% with both? But err conservative -- if the percentages were, say 75%, 64%, and 50%, would that still be enough difference as to be worth showing?

If you have one time at which the effect should be maximum (specified in advance, not after the fact), say that, and test at that time point with alpha = 0.05, all the rest at alpha = 0.01. If you can't specify which time will be best, you may need to test them all at alpha = 0.01. But even in the former case, you need enough power for specific comparisons, not just the overall. Suppose the combined only made a 10% difference compared to A. How many subjects would you need for that comparison, especially if correcting for multiple comparisons?

Unless somebody else has a better analysis plan. Not sure how comfortable you are with fancier methods. An alternative would be Kaplan-Meier, with time to complete success as the outcome. That combines all the times, eliminates all but the most basic multiple comparison and is also fairly straightforward to do in a power calculator. But you may have to specify some things, like the parameters of the survival curves.

#### hlsmith

##### Not a robit
Is the intervention group assignment randomized?

#### ndphysio

##### New Member
Hello EdGr, I am not doing survival analysis. I will be using repeated measures as all my other outcomes (self reported measures: secondary outcomes) which will be assessed at baseline, 6wk, 3m, 6m and 1yr FU are continuous. Complete relief from symptoms will be considered as success (primary outcome) and percentage of cases with complete (success), partial and no relief (failure) will be computed at 6wk, 3m, 6m and 1yr FU from descriptive statistics.
So, I will not be doing survival analysis. I am from medical background and know only basic statistics. Can you please advise me ahead and kindly feel free to inform me if I am expected to provide any additional information.
Thank you
Regards
Neha

#### EdGr

##### New Member
If your primary outcome is a yes/no variable, I would not attempt to do repeated measures analysis without a statistician familiar with such techniques on board. I'm not even sure how I would go about doing a group by time repeated measures analysis with a dichotomous outcome. So my advice to do sample size by chi-square stands.

All the rest can be ANOVA, as you specify. But again, I would simplify. Focus on 1 key time point for power. Or maybe on the main effect of group, and of course any specific group to group comparisons of interest.

If you decide to try Stata power, be sure you understand which effect and comparison it is calculating power FOR. You can probably try a range of correlations, like 0.3 to 0.8 and see how much difference the variations make. If not much, then you can just pick a reasonable one. I don't have time to look up epsilon right now, but I think it is multiplied by degrees of freedom, so 1 means no correction. try that from 1 to 0.5 and see what difference it makes. But note -- this is for numeric measures, not a yes/no outcome.

You needed this by yesterday. Needless to say, the way to get help by yesterday is to have asked a week ago!

#### ndphysio

##### New Member
Thanks so much EdGr. I have actually sorted out using formula :
n = [(Zα/2 + Zβ)2 × {(p1 (1-p1) + (p2 (1-p2))}]/(p1 - p2)2

Ref: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3148614/

I have used method for comparing 2 proportions from above reference
I computed sample size needed to compare G1 vs 2, 2 vs 3 and grp1 vs 3. I wrote that I clinically expect/assume that 85 % of patients will get relief with combined intervention (group C). previous literature says 64% get relief with intervention A and 31 % with intervention B.
Based on that I have chosen largest sample needed. I found that I need a sample of 192 patients (64 in each grp). This is because 27 patients are needed to see if there is difference between grp 2 and 3, 96 patients are needed to see if there is difference in grp 1 & 2, and 192 are needed to see difference in grp 1& 3. So, maximum number of patients needed will be 192. If I unsure if this is correct method or not. So, I have forwarded to what I could sort out about sample size to my supervisor and I will wait to hear back if she has any advise.

In the meantime, can you please advise me: do you think my method of computing sample size seem to be ok to use or not?

Yeah, I agree that I would have posted it earlier but I came to know about this talkstats yesterday for the first time while I was trying to sort out some solution for sample size. So, I registered myself yesterday. I really appreciate appreciate you guys for your time and selfless help to students.

Thank you again.

#### EdGr

##### New Member
When I compare 85% versus 64% using a 2-tailed test at alpha = 0.05, I get 67 per group, which would be 201 total per 3 groups. This is obviously pretty close to what you got. If, however, I did that comparison at alpha = 0.05/3 to correct for number of comparisons, I would need a larger sample size. Are you defining that comparison as primary? In fact, do you even care about A versus B? Could you do a 2-group study? If A versus C is, at one time point of interest, the primary comparison, you could justify your number at alpha = 0.05. Just be sure you are laying that out in advance so everyone can see how you plan to analyze.

Now, suppose your advisor says, "Well, historically there was a difference of 85% vs. 64% but we'd better plan to detect a smaller difference, just in case, say 80% versus 65%, then the required sample size basically doubles. You need to decide how you want to handle the tradeoff between making sure you don't miss anything important and increasingly large and unwieldy numbers of subjects. You also need to consider type I error.

Realistically, sample size estimation is heavily subjective. I see it as a way of getting in the right ballpark, rather than an exact calculation, because there are so many subjective decisions.

#### EdGr

##### New Member
When I compare 85% versus 64% using a 2-tailed test at alpha = 0.05, I get 67 per group, which would be 201 total per 3 groups. This is obviously pretty close to what you got. If, however, I did that comparison at alpha = 0.05/3 to correct for number of comparisons, I would need a larger sample size. Are you defining that comparison as primary? In fact, do you even care about A versus B? Could you do a 2-group study? If A versus C is, at one time point of interest, the primary comparison, you could justify your number at alpha = 0.05. Just be sure you are laying that out in advance so everyone can see how you plan to analyze.

Now, suppose your advisor says, "Well, historically there was a difference of 85% vs. 64% but we'd better plan to detect a smaller difference, just in case, say 80% versus 65%, then the required sample size basically doubles. You need to decide how you want to handle the tradeoff between making sure you don't miss anything important and increasingly large and unwieldy numbers of subjects. You also need to consider type I error.

Realistically, sample size estimation is heavily subjective. I see it as a way of getting in the right ballpark, rather than an exact calculation, because there are so many subjective decisions.