Log transformation and multiple imputation

#1
Hello, below is a part of an assignment. Can someone tell me whether I have to perform log transformation before or after multiply imputing the data?

  1. Download the School readiness data from blackboard (Part 4 – Joost van Ginkel). The data set consists of the following variables: The child’s Gender, Mother’s ethnicity, School readiness, Mother’s depression score, Mother’s educational level, Mother’s age, Family income, Mother’s IQ, and the Child’s IQ. The data set contains missing values which have to be multiply imputed before carrying out statistical analyses.

  2. Carry out multiple imputation (Rubin, 1987), using all variables in the imputation procedure. Also include the two-way interactions among categorical predictors (see, computer lab session 2). Before running the imputation procedure, paste the command in the syntax first. The data set is too large to carry out a multiple imputation using the default settings in SPSS. The settings may only be changed using the syntax. You can change the settings in the syntax by adding an additional command to the /IMPUTE subcommand: MAXMODELPARAM=100000.
    Next, we are interested in how the child’s school readiness is predicted from mother’s depression score, mother’s age, family income, mother’s IQ, and the child’s IQ. Carry out this regression analysis. Note that some of the numeric variables are highly skewed which have to be log transformed (In SPSS: COMPUTE [new variable name] = ln([old variable name])) before they can be used for the statistical analysis.
    Carry out the regression analysis using Mixed Models rather than the standard regression option in SPSS, and save both the regression coefficients and the covariance matrices of the regression coefficients to an SPSS data file using the OMS option (see the manual MI-mul2manual.pdf of the MI-mul2.sps syntax file by Van Ginkel (2010), pp. 5-8). We need this output for later analyses.
    The following things are important to think about:
    Whether you perform the log transformation before or after multiply imputing the data. Note: Don’t try to find the answer to this question in the literature because the literature will not be of any help here. The idea is that you try to reason yourself which option will be correct in this case and which option will be incorrect, and what exactly will go wrong when you choose the incorrect option.
 

hlsmith

Not a robit
#2
This looks like an assignment, so please post what you think the answer is and your rationale. I will check back and provide what I would do.

Hint, if IQ is a continuous variable are you doing MI via OLS or what procedure? If OLS, what are its assumptions?
 
#3
This looks like an assignment, so please post what you think the answer is and your rationale. I will check back and provide what I would do.

Hint, if IQ is a continuous variable are you doing MI via OLS or what procedure? If OLS, what are its assumptions?
what's OLS pls?
 
#7
This looks like an assignment, so please post what you think the answer is and your rationale. I will check back and provide what I would do.

Hint, if IQ is a continuous variable are you doing MI via OLS or what procedure? If OLS, what are its assumptions?
I would think to first do the transformation to make sure the normality aasumption is met. However, I do not know if I create more bias by doing that? Doesnt it depend on whether the relationship is linear or not before transformation?
 

hlsmith

Not a robit
#8
Not clear, so you are using ordinary least squares (OLS) for the imputation, yes/no? Also, to confirm, you say you are using multiply [sic] imputation. Is that correct, too? So you are using OLS 'm' times?
 
#9
I can use linear regression for the multiple imputation or predictive mean matching. What would you prefer? I use OLS/predictive mean matching 10 times yeah. The number of imputations is 10. But when to do the log transformation? Does that depend on whether the relationship between the independent variable(s) and the dependent variable is linear?
 

hlsmith

Not a robit
#10
Can you describe this in more detail, "OLS/predictive mean matching 10 times"?

This can go 3 ways, first off - why does the variable need to be transformed, is it highly skewed? And there is a chance that the skewness may impact the final model but not the imputation model, vice versa, or neither. Though, I bet the final approach should be transformed the skewed variable and use it as a transformed variable in every aspect of the analyses. Lastly you will need to remember to make the correct interpretations of the transformed variable in the final outcome model.

Side note, it seems you were assigned to do imputation, ignoring you will eventually have a multi-level model. that may not be best practice in real-life, but if that is the assignment it is probably fine in this scenario.
 
#11
This is what I found elsewhere in an article on multiple imputation:
Don’t transform skewed variables. Likewise, when you transform a variable to meet normality assumptions before imputing, you not only are changing the distribution of that variable but the relationship between that variable and the others you use to impute. Doing so can lead to imputing outliers, creating more bias than just imputing the skewed variable.

Some variables are indeed highly skewed. So I have to do a log transformation of several variables.

And, before log transformation I even have to do another transformation. I have in my data set skewed variables with a minimum value of zero. The natural logarithm of 0 does not exist so for these variables a log transformation will fail. An additional transformation has to be carried out prior to performing the log transformation, such that the minimum value is greater than 0. Which transformation do I have to use?
 
Last edited:

hlsmith

Not a robit
#12
I would look at box-cox transformations, which usually just have you add a constant to the variable. If you are using OLS for imputation, I would do it with and without transformation and see what the residuals look like in those models. The residuals are going to likely dictate if a transformation is warranted. Using imputation also assumes data are Missing At Random, you are told to do this, but in real-life you would want to examine and test for this.