# Power Calculation for a Multilevel Linear Regression

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Well, I am getting jealous by all of the posting on this topic. I have my own power calculation that I need to run and thought I would see if you all could provide a little guidance. I haven't done one for MLM data before.

Study:
-Two treatments groups (Treatment1 = Tx1; Treatment2 = Tx2)
- 30 subjects (#subjects)
- Crossover design: each subject will receive each treatment five times then the other treatment. Subjects treatment order is balanced via randomization.
- Outcome: continuous variable (DV), 5 serial measures for Tx1 and 5 serial measures for treatment Tx2, order of serial Tx randomized

Assumptions:
Tx1_1 will have an effect on DV of rnorm(#subject, 25, 5) and the value will go down with each Tx1 application, so autoregressive correlation across 5 serial measures

Tx2_1 will have an effect on DV of Tx1_1 + rnorm(subject#, 1.5, 0.5) and the DV value will go down with each Tx2 application, so autoregressive
correlation across 5 serial measures)

The second treatment given (either Tx1 or Tx2) will start off with an effect on DV of rnorm(subject#, 5, 1.5) lower than the first observation for the first time in the first period.

I am happy to do this in R. Below is what the dataframe could look like, these data are manually made up.

Code:
Sequence  Time  Period  Subject  Treatment  DV
1 1 1 1 1 25
1 2 1 1 1 24
1 3 1 1 1 22
1 4 1 1 1 20
1 5 1 1 1 20
1 1 2 1 2 20
1 2 2 1 2 18
1 3 2 1 2 16
1 4 2 1 2 15
1 5 2 1 2 20
2 1 1 2 2 19
2 2 1 2 2 19
2 3 1 2 2 17
2 4 1 2 2 16
2 5 1 2 2 15
2 1 2 2 1 16
2 2 2 2 1 16
2 3 2 2 1 14
2 4 2 2 1 13
2 5 2 2 1 13
...
1 5 2 30 1 15
So I guess I need to write out a function, but the correlation between DV values will need to be accounted for.

Tx1_1_1 = treatment 1 first period and first timepoint
E(DV_Tx1) = (Tx1_1_1) + (AR(Tx1_1_1) +,..., (AR(Tx1_1_4), something like this gets the first period obs for Tx1
E(DV_Tx2) = (Tx1_1_1 + Tx2_1_1) + (AR(Tx2_1_1) +,..., (AR(Tx2_1_4), something like this gets the first period obs for Tx2
Then I need to add an indicator to halve of the obs if they occurred during the second period, so Tx?_2_1 will be lower.

So moreover, DV is slightly lower for Tx1 than Tx2 and the DV goes down a little with each treatment during the period, then the starting value in the second period should go down - so not a clean washout between the periods.

Getting ready to leave work, but I will check back to see if there are any comments. The autoregressive correlation with the values going down will be the interesting part.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
My first crack at the simulation is below. I am sure there are better approaches but the coefficients are pretty close to my target values. I just need to tune this up and replicate over and over. I am guessing that requires an apply or one of those functions.

Code:
set.seed(1)
#Period 1 Tx1
Y111 = rnorm(15,30,2)
Y112 = Y111*rnorm(1,0.95,0.05)
Y113 = Y112*rnorm(1,0.95,0.05)
Y114 = Y113*rnorm(1,0.95,0.05)
Y115 = Y114*rnorm(1,0.95,0.05)

#Period 2 Tx2
Y221 = Y111 + (rnorm(1,4,2.5)) - (rnorm(1,4,1.5))
Y222 = Y221*rnorm(1,0.95,0.05)
Y223 = Y222*rnorm(1,0.95,0.05)
Y224 = Y223*rnorm(1,0.95,0.05)
Y225 = Y224*rnorm(1,0.95,0.05)

#Period 1 Tx2
Y211 = rnorm(15,35,2.5)
Y212 = Y211*rnorm(1,0.95,0.05)
Y213 = Y212*rnorm(1,0.95,0.05)
Y214 = Y213*rnorm(1,0.95,0.05)
Y215 = Y214*rnorm(1,0.95,0.05)

#Period 2 Tx1
Y121 = Y211 - (rnorm(1,5,2)) - (rnorm(1,4,1.5))
Y122 = Y121*rnorm(1,0.95,0.05)
Y123 = Y122*rnorm(1,0.95,0.05)
Y124 = Y123*rnorm(1,0.95,0.05)
Y125 = Y124*rnorm(1,0.95,0.05)
DV = c(Y111,
Y112,
Y113,
Y114,
Y115,
Y221,
Y222,
Y223,
Y224,
Y225,
Y211,
Y212,
Y213,
Y214,
Y215,
Y121,
Y122,
Y123,
Y124,
Y125)
str(DV)
DV

Tx = rep(c(1,2,2,1),times = c(75,75,75,75))

Period = rep(c(1,2,1,2),times = c(75,75,75,75))

Attempt=rep(1:5, times=4, each=15)

ID1 = rep(1:15, times=10, each=1)
ID2 = rep(16:30, times=10, each=1)
ID =  c(ID1, ID2)
MLM = data.frame(Tx, Period, Attempt, ID, DV)

#To visualize DV values next to each other
glimpse = data.frame(Y111,
Y112,
Y113,
Y114,
Y115,
Y221,
Y222,
Y223,
Y224,
Y225,
Y211,
Y212,
Y213,
Y214,
Y215,
Y121,
Y122,
Y123,
Y124,
Y125)

#### hlsmith

##### Less is more. Stay pure. Stay poor.
My data generating function for simulating multilevel (repeated measure) data. I simulated more data than the desired sample size called for, then grouped data into sets the size of the desired n-value in order to generate more than one simulation set. I am guessing this is appropriate enough but not the absolute best approach.

Code:
m=100
n=15*m
set.seed(1)
#Period 1 Tx1
Y111 = rnorm(n,30,3) + rnorm(n,0,1)
Y112 = Y111*rnorm(n,0.95,0.08) + rnorm(n,0,1)
Y113 = Y112*rnorm(n,0.95,0.08) + rnorm(n,0,1)
Y114 = Y113*rnorm(n,0.95,0.08) + rnorm(n,0,1)
Y115 = Y114*rnorm(n,0.95,0.08) + rnorm(n,0,1)

#Period 2 Tx2
Y221 = Y111 + (rnorm(n,4,3)) - (rnorm(n,4,1.5)) + rnorm(n,0,1)
Y222 = Y221*rnorm(n,0.95,0.08) + rnorm(n,0,1)
Y223 = Y222*rnorm(n,0.95,0.08) + rnorm(n,0,1)
Y224 = Y223*rnorm(n,0.95,0.08) + rnorm(n,0,1)
Y225 = Y224*rnorm(n,0.95,0.08) + rnorm(n,0,1)

#Period 1 Tx2
Y211 = rnorm(n,35,3) + rnorm(n,0,1)
Y212 = Y211*rnorm(n,0.95,0.08) + rnorm(n,0,1)
Y213 = Y212*rnorm(n,0.95,0.08) + rnorm(n,0,1)
Y214 = Y213*rnorm(n,0.95,0.08) + rnorm(n,0,1)
Y215 = Y214*rnorm(n,0.95,0.08) + rnorm(n,0,1)

#Period 2 Tx1

Y121 = Y211 - (rnorm(n,4,3)) - (rnorm(n,4,1.5)) + rnorm(n,0,1)
Y122 = Y121*rnorm(n,0.95,0.08) + rnorm(n,0,1)
Y123 = Y122*rnorm(n,0.95,0.08) + rnorm(n,0,1)
Y124 = Y123*rnorm(n,0.95,0.08) + rnorm(n,0,1)
Y125 = Y124*rnorm(n,0.95,0.08) + rnorm(n,0,1)
DV = c(Y111,
Y112,
Y113,
Y114,
Y115,
Y221,
Y222,
Y223,
Y224,
Y225,
Y211,
Y212,
Y213,
Y214,
Y215,
Y121,
Y122,
Y123,
Y124,
Y125)

Tx = rep(c(1,2,2,1),times = c(75*m,75*m,75*m,75*m))

Period = rep(c(1,2,1,2),times = c(75*m,75*m,75*m,75*m))

Attempt = rep(1:5, times=4, each=15*m)

mset = rep(c(1:m), times = 20, each = 15)

ID1 = rep(1:(n), times=10, each=1)
ID2 = rep(((n)+1):(n*2), times=10, each=1)
ID =  c(ID1, ID2)
MLMM = data.frame(Tx, Period, Attempt, ID, DV, mset)