# Minimum control size for determining effectiveness of treatment

#### xandalorian

##### New Member
So the following scenario

• I have a treatment that I know to be effective, but I want to determine precisely how effective it in a (large) group of individuals
• I'm dividing them in to a test group and a control group so I can compare the effect of the treatment
• However, I want to keep the test group as large as possible (so as many people as possible get the treatment) and the control as small as possible, while still being able to determine how effective the treatment is with some confidence
• With other (unrelated) tests in the past, we have arbitrarily used an 80/20 test control split, but we now want to try to make our tests more effective by keeping the control group as small as possible

What approach can I use to determine this?

Thanks

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Well as you know, the smaller the control group the greater the risk for sampling variability in that group (reflective in its calculated standard error). I would perhaps play around with simulating these data and running tests on the simulated data.

#### xandalorian

##### New Member
There's no direct way to calculate this? I'm looking for something like this http://www.evanmiller.org/ab-testing/sample-size.html (though this is for an evenly split A/B test)

I would imagine that this issue comes up all the time (need to have a control group for verification, but want to keep it as small as possible). Are there any resources or papers published on how to approach this problem?

Thanks

EDIT: I did find a potential calculator and am trying to determine if they do what I want them to do. Does this look right (or wrong) to anyone?

http://stochasticsolutions.com/cgi-bin/fleiss/fleiss.cgi

It uses Fleiss formula - more info here http://stochasticsolutions.com/fleiss-help.html

Reading through it it does sound like it's what I'm looking for

Last edited:

#### Dason

##### Ambassador to the humans
What is the most important question for you to answer. Keep in mind that by moving the split away from 50/50 you can make a smaller confidence interval around the mean of the treatment but you're actually increasing the width of the confidence interval for the difference between the treatment and control means.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Is there randomization of treatment?

Are there sample size limitations.

How is the outcome defined?

Do you have hypothesized effect estimates and dispersion for outcome (if applicable).

You can move forward anyway you like, but this seems ideal for a data simulation.

#### xandalorian

##### New Member
> Is there randomization of treatment?

Yes

> Are there sample size limitations.

Total population is 500K, control group should be as small as possible

> How is the outcome defined?

It's a binary outcome.

> Do you have hypothesized effect estimates and dispersion for outcome (if applicable).

Not sure, but we would have some baseline data.

This is actually for an email campaign, but it sounded like something related to epidemiology so I thought I tried to translate it here. Basically, I've been asked to either find a simple tool, or create a simple tool for determining minimum sample size based on some parameters. I found this tool http://stochasticsolutions.com/cgi-bin/fleiss/fleiss.cgi but I've found a few bugs with it, and not a lot of technical detail so I'm hesitant (some explanation for it here though http://stochasticsolutions.com/fleiss-help.html)

It uses Fleiss formula, and the method seems to come from this paper http://www.jstor.org/stable/2529990?seq=1#page_scan_tab_contents

It does sound like this is what I need, but I can't find any R packages for it (which would allow me to do it myself and verify the tool)...

#### Dason

##### Ambassador to the humans
Quoted since it got ignored:

What is the most important question for you to answer. Keep in mind that by moving the split away from 50/50 you can make a smaller confidence interval around the mean of the treatment but you're actually increasing the width of the confidence interval for the difference between the treatment and control means.
You keep saying you want the control as small as possible and I can understand why intuitively you would think that would be the way to go. But once again - what is your actual goal? What is the most important question you want to answer?

#### xandalorian

##### New Member
Sorry missed that reply. The question to answer is "how much better is the test group performing", but it comes with the constraint that we want as many people as possible to receive the treatment (email).

#### hlsmith

##### Less is more. Stay pure. Stay poor.
So you have an exposure (control or experiment group) and binary outcome. You are looking at a simple logistic model (it seems).

Now you need to hypothesize the proportion of successful outcome for each exposure group.

And you are assuming there will be 500k observations? So you can find a difference here probably pretty easily if it truly exists, given the sample size. You will also need to make a decision on the meaningfulness of the difference, since the test will be over powered, until you start shifting the number of observations into the experimental group.

#### Dason

##### Ambassador to the humans
since the test will be over powered, until you start shifting the number of observations into the experimental group.
This is bad logic. No test is over powered. If you ever think that a test is "over powered" then you're not actually performing a test of what you're interested in.

#### xandalorian

##### New Member
Sorry let me reduce that to a better example I have 20K people to send the email to. I'm tasked with answering "what is the minimum number of people needed in the control group for us to be able to determine the improvement of the test group". Previously they have just done an 80/20 split just as a standard, now they are looking for a specific answer to the question, along with some kind of motivation. Should we have 1%, 5%, 10% etc in the test group?

#### Dason

##### Ambassador to the humans
It entirely depends on what the effect size is. What is your response variable anyways? Is it a number or just a yes/no binary kind of thing?

#### xandalorian

##### New Member
Well there are a few different scenarios. In the simplest case, it's just binary. But there are situations where it will be a number (didn't want to bring that up at first due to it being more complicated).

> It entirely depends on what the effect size is.

Yes, but let's assume it's x...how would I proceed (in the binary case)?

#### CarSi

##### New Member
I see why the size difference is so much. I forgot about the other dimensions. But is there really always risk with a community set up for turtles? What if they are brothers? If they get along when they are hatchlings will they get along when they are adults? What if there are no females?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I like turtles of the teenage mutant variety.

Dason, you are being too logical. Overpower is a broad idea, of say you want to show two parameter estimates are different, which are truly different and you have a sample approaching the population size. Well, you have plenty of people to show a difference. Now say I start making one subgroup a smaller proportion, well its n-value used in the standard error will get smaller, etc. So you are taking a very well powered comparison and shrinking it down to the bare minimal needed to power the test. So overpowered to marginally powered. You get this concept, but love to be a stickler. And no you can never be statistically overpowered but from a study design standpoint you can be overpowered given caveats (risk, cost, burden, etc.).

P.S., The overpower idea usually comes into play when you are able to find non-relevant or clinically insignificant differences.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Xandalorian,

You should just take a sample or hypothetical effect size and simulate varying proportion sizes based on those until you get a type II error (if there really isn't a difference). Then you may want to buffer up the proportion a little for caution's sake (in case your hypothesized effect was off).

I am not savvy in the automation department, but you could run a loop or macro of sorts to out put the p-values for a descending proportion size and plot them to understand the point where your proportion size gets questionable. Once you get it for your binary example just switch from Bernoulli base data to random normal or modified normal, etc.