Heterodastic thread on heteroskedasticity

spunky

Doesn't actually exist
#1
A follow-up the conversation that we were having. Basically, the idea that for a simulated, linear mixed effects model of the form \(Y = 0.5 + 0.5X_{1} + u_{0} + \epsilon_{ij}\) where \(u_{0} \sim N(0, 0.7)\) and \(\epsilon \sim N(0, 0.3)\) to get an intra-class correlation (ICC) of 0.7. ends up not showing anything even remotely heteroskedastic in the plot. Also, when you simulate data and test the residuals using the Breusch-Pagan test, the p-value is not significant. HOWEVER the Breusch-Godfrey of serial correlation *is* significant. Which low key took me by surprise because I assumed random intercept models are not necessarily auto-regressive models of lag 1 (which is what the test in R shows... I think?)

Anyhoo, code:

Code:
library(ggplot2)
library(lmtest)
set.seed(123)##para fig 4

library(lme4)
library(MASS)

Simulate <- function(k, n, icc, beta, sigma=NA, sigmaXre=NA){

nk <- n*k
nx <- length(beta)-1
Z <- kronecker(diag(k), rep(1,n))

# X matrix
if (is.na(sigma[1])) sigma <- diag(nx)
X <- mvrnorm(nk, rep(0,nx), sigma)

# random effects of X
if (!is.na(sigmaXre[1])){
Xre <- t(matrix(rnorm(k*nx,rep(1,nx),sigmaXre),nrow=nx))
Xre <- cbind(rep(1,nk), X * (Z %*% Xre))
} else {
Xre <- cbind(rep(1,nk), X)
}
X <- cbind(rep(1,nk), X)

# create a factor to keep track of which "students" are in which "school"
group <- as.factor(rep(1:k, each=n))

# generate taus and epsilons
tecov <- diag(c(icc, rep(1-icc,n)))
te <- mvrnorm(k, rep(0,n+1), tecov)
epsilons <- as.vector(t(te[,-1]))
taus <- te[,1]

# generate Y data
ran <- Z %*% taus + epsilons
Y <- Xre %*% beta + ran

output <- list(Y, X[,2], group)
class(output) <- "data.frame"
colnames(output) <- c("Y", "X1", "group")
return(output)
}



# How many "schools"?
k <- 100
# How many "students" per "school"?
n <- 100
# Which ICC?
real.icc <- 0.7
# Which fixed effects?
beta <- c(0.5,0.5)
# Covariance matrices of the X variables (a positive-definite symmetric matrix)
sigma <- matrix(c(1), length(beta)-1)
# Covariance matrix of the random effects of X (or vector of variances where zero means no random effect)
sigmaXre <- matrix(c(rep(0,1)), length(beta)-1) #intercept-only model

data1 <- Simulate(k, n, real.icc, beta, sigma, sigmaXre)

Y  <-c(data1$Y)
X1 <-c(data1$X1)


model <- lm(Y~ X1)
Tests:
Code:
> bptest(model)

        studentized Breusch-Pagan test

data:  model
BP = 0.14362, df = 1, p-value = 0.7047

> ols_test_score(model)
Error in ols_test_score(model) : could not find function "ols_test_score"
> bgtest(model)

        Breusch-Godfrey test for serial correlation of order up to 1

data:  model
LM test = 4729, df = 1, p-value < 2.2e-16
 

Attachments

hlsmith

Not a robit
#2
My brain can't really process that residual plot. Something on the subconscious level feels off about it, perhaps the shear density in the center. Also, how do your standardized tests of residuals function when playing around with samples sizes?
 

Dason

Ambassador to the humans
#3
I didn't look too much into it but if your data is arranged such that observations in the same group are in consecutive rows then serial correlation being detected makes sense since consecutive rows share a common random effect.
 

spunky

Doesn't actually exist
#4
I didn't look too much into it but if your data is arranged such that observations in the same group are in consecutive rows then serial correlation being detected makes sense since consecutive rows share a common random effect.
Pfff... you're so right. I'm so dumb sometimes. Thanks for pointing it out!
 

Attachments

hlsmith

Not a robit
#6
I just meant it is so dense with observations it doesn't remind me of the shotgun spays I am used to. I doubt it, but there could be irregularities in it not discernible to the naked eye because the transparency is lost after marginal overlap of observations. But it is likely fine, just may brain is having issues.
 

spunky

Doesn't actually exist
#7
I just meant it is so dense with observations it doesn't remind me of the shotgun spays I am used to. I doubt it, but there could be irregularities in it not discernible to the naked eye because the transparency is lost after marginal overlap of observations. But it is likely fine, just may brain is having issues.
Well, the first part of my code has the DGP (Data Generation Process) so you can always reduce the simulated sample sizes to have a better look. Right now I'm doing 100 level 1 units (e.g. "kids") in 100 level 2 units (e.g. "classrooms") so if this were real data, my total sample size would be 10,000 "kids" ... so yeah, a lot. LOL.

But if you reduce the sample and plot them, the residual plot looks fairly circular.