# should my model include separate predictors?

#### stats20

##### New Member
Let's say I measured blood pressure on day1 and day2 three times a day (morning, afternoon and evening).
Code:
dat <- data.frame(ind=c(1,1,1,2,2,2,3,3,3,4,4,4), day1=c(90,113,122,86,84,95,114,126,123,115,92,103), day2=c(141,123,134,112,112,115,92,100,121,133,124,89), time=rep(c("morning","afternoon","evening"),times=4))

ind day1 day2      time
1   90  141   morning
1  113  123 afternoon
1  122  134   evening
2   86  112   morning
2   84  112 afternoon
2   95  115   evening
3  114   92   morning
3  126  100 afternoon
3  123  121   evening
4  115  133   morning
4   92  124 afternoon
4  103   89   evening
In R, I can model the data this way:
Code:
mod <- lm(day1 ~ day2, data=dat)
But I can also reshape dat in this way:
Code:
library(reshape2)
dat2 <- melt(dat, id.vars = c("ind","time"), variable.name="day)

ind      time    day    value
1   morning     day1    90
1 afternoon     day1   113
1   evening     day1   122
2   morning     day1    86
2 afternoon     day1    84
2   evening     day1    95
3   morning     day1   114
3 afternoon     day1   126
3   evening     day1   123
4   morning     day1   115
4 afternoon     day1    92
4   evening     day1   103
1   morning     day2   141
1 afternoon     day2   123
1   evening     day2   134
2   morning     day2   112
2 afternoon     day2   112
2   evening     day2   115
3   morning     day2    92
3 afternoon     day2   100
3   evening     day2   121
4   morning     day2   133
4 afternoon     day2   124
4   evening     day2    89
And do:

Code:
mod <- lm(value ~ day, data=dat2)
These are the same data but the model parameters are very different. Which way of modelling the data would be more appropriate?

Last edited:

#### hlsmith

##### Less is more. Stay pure. Stay poor.
What is the purpose of the model.
M1: gives you average predictions for 24 hours of time elapsed
M2: gives you predictions for time of day

Both models neglect to address the violation of independence between observations. Traditionally multilevel models would be used. How many ind do you have?

#### noetsi

##### No cake for spunky
Or SEM time based models or an ANOVA model with a time factor depending on what you are testing. There are many options.

#### stats20

##### New Member
Thank you for your answer @hlsmith. Yes, I'd do this with lme4:
mod1 <- lmer(day1 ~ day2 + 1|ind, data=dat) and mod2 <- lmer(value ~ day + 1|ind, data=dat2).
So that means fitting either mod1 or mod2 is correct depending on the question. But what question would mod1 and mod2 answer?
You mention mod2 gives predictions for time of day but time of day isn't included as a predictor in the model. Wouldn't that be the case if I had done mod2 <- lmer(value ~ day + time + 1|ind, data=dat2)? Then I would have predictions for time of day, or am I wrong?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Would you want day1 to be explained by day2? What would be wrong with that?

#### stats20

##### New Member
Thanks @hlsmith. So with mod1, I have the possibility to explain day1 by day2 (or the other way round), which I cannot do with mod2. But then what is the advantage of mod2? What does it answer that mod1 cannot? You mentioned above that "gives you predictions for time of day", but I thought for that I would need to include "time" as a predictor: mod2 <- lmer(value ~ day + time + 1|ind, data=dat2), no?

My question is not so much specific to this example but more general about what it means to have separate predictors (day1, day2, ...)in the model versus one predictor with several levels (like day in mod2).

#### hlsmith

##### Less is more. Stay pure. Stay poor.
How much data and groups do you have - this may influence my response.

#### stats20

##### New Member
My actual data set has about 700 individuals (ind) and each individual has about 300 time points (time) for each of the two predictors (day1 and day2).

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Explain the last part in more detail. A person has 300 day1 and 300 day2 data points? And these are for the say Day1 and Day2?

So why isn't this just time series data?

#### stats20

##### New Member
Yes, a person has 300 data points in day1 and 300 data points in day2. It is time series data that I'm trying to model with a mixed model.
ind day1 day2 time
1 90 141 1
1 113 123 2
1 122 134 3
1 86 112 4
.............................................
1 22 131 300
2 12 333 1
..............................................

The question is what questions can I answer with mod1 versus mod2?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
What is the difference between day1 and day2? Is there a gap between the last day1 value and the first day2 value or are all values evenly spaced out?

#### stats20

##### New Member
To make it simple let's say I collected 300 data points of heart rate and 300 data points of blood pressure acquired at the same time for each person. Time point is in seconds, so 300 seconds of continuous recorded data.

If my dat is:
Code:
ID  heart_rate   blood_pressure   time_point
1       90          141               1
1      103          123              2
1      102          134              3
1       76          112              4
.............................................
1       90          131             300
2       70          189              1
..............................................
700     80          150             300
and dat2 is coded:
Code:
ID        variable        value    time_point
1       heart_rate        90        1
1       heart_rate        103        2
1       heart_rate        102        3
1        heart_rate        76        4
.........................................................
1        blood_pressure    141        1
1        blood_pressure    123        2
1        blood_pressure    134        3
1        blood_pressure    112        4
.........................................................
1        blood_pressure    131        300
2        heart_rate        70         1
.........................................................
700        blood_pressure    150        300
The same question applies:

What is the difference between mod1 <- lmer(heart_rate ~ blood_pressure + 1|ID, data=dat) and mod2 <- lmer(value ~ variable+ 1|ID, data=dat2)? What can each of these models tell me?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Regardless of the posted model code, what is the study question? And Heart rate and blood press are their own variable right? Above makes it look like you have them list in the same column. Which doesn't make sense.

#### Buckeye

##### Active Member
I agree with hlsmith. Typically, we start with a research question and then determine what approach works best with the data we have. Ideally, collect data after developing the research question.

Last edited: