Pooled estimates hazard ratio

klo

New Member
#1
Hi,

I builded a database with SPSS, and I used multiple imputation to impute missing data. Now I imported this database in R and trying to do further analysis.

Issue is that R doesn't know that is multiple imputed dataframe (5x), and get it as 1 data frame. I get wrong estimates and no pooled estimate.

Can someone point me in right direction?

Thanks,

K
 

Lazar

Phineas Packard
#2
The package Zelig will run some models with MI data but this is generally limited to . Otherwise you split the data by imputation and use apply functions and then write a few functions to calculate the pooled estimates. The pooled estimate equations are not that hard really so the functions are quite straightforward to write.
 

Lazar

Phineas Packard
#3
The other option is to not impute your data and estimate your models in lavaan using full information maximum likelihood estimation for missing. Being creative and using model constraints you can fit just about any model.
 

klo

New Member
#4
Thanks Lazar for your answers.

I will install Zelig package to see how it works.

First I though to write a function for calculating pooling estimate, because I dont want to change that database, later will be used with spss again for other analysis.

Im new to R, and question is, does R provide pooled estimate in output as SPSS shows? (never imputed data with R and dont know how is output)
 

Lazar

Phineas Packard
#5
It will depend. If you write your own function you can have it print out the results however you want - you could make it mimic SPSS if you really wanted. R is not just a statistical package but a programming language so you can have it do whatever you like. I cannot remember what zelig does and I am not sure what models it does it for.
 

klo

New Member
#6
I did a algorithm with R (to find confounder in model), which works perfectly.

Now, I just need to get pooled HR of my exposure. This seems not possible because dataframe has multiple imputed data, and as I mention above R cant figure this out. I dont get pooled estimates of HR.

To write a function for calculation pooled estimate, will be little complicated because has to write code for 50 models (which I have to test 50 variable for confounders).

Thanks,

K
 
Last edited:

Lazar

Phineas Packard
#7
This is ugly code and dense. As a new R user this may not make a lot of sense but to show it is possible:
Code:
#########Fake imputed data set######
library(MASS)
mydata<-data.frame(mvrnorm(1000, rep(0,3), matrix(c(1,.5,.3,
                                        .5,1,.25,
                                        .3, .25,1),3)))
mydata$imp<-rep(1:10, each=100)
############################################

#######Regression separately by imputations and save results##
# simple regression model y~x1 + x2
results<-sapply(split(mydata, mydata$imp), function(x) summary(lm(x[,1]~x[,2]+x[,3]))$coef[,1:2])
results<-t(results)
colnames(results)<-c('int.est',"x1.est","x2.est","int.se","x1.se","x2.se")
#############################################

##Pooled estimates####

#POOLED Point estimates
colMeans(results[,c(1:3)])
#Pooled standard errors
colMeans(results[,c(4:6)]) + (1+1/10)*(apply(results[,c(1:3)],2, var))
This gives:
Code:
#POOLED Point estimates
    int.est      x1.est      x2.est 
-0.01013974  0.40606096  0.18676873 
#Pooled standard errors
    int.se      x1.se      x2.se 
0.09043925 0.09342277 0.10186080
Anything in R is possible but it requires knowing how to get things coded. I would suggest taking some time to get equated with R coding or get someone who does to code you something up.