Ignoring nested data structure?

#1
Hi! Is ignoring the nested data structure harmful for descriptive statistics? Or is it only when you introduce independent variables to test effects/associations?

I am identifying clusters (using latent class analysis) of respondents from a cross-national data set. After this, I will do analyses with covariates in MLwiN (multilevel structure), but I am wondering if I have already made a big mistake by ignoring the nested structure in the LCA?
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
I guess it depends on the purpose and how you are using the results. I believe in the past, I have fit empty models but controlled for the clusters. Can you tell us more about your context.
 

hlsmith

Less is more. Stay pure. Stay poor.
#3
Wait - does this mean people could contribute more than one observation, and that the values would be correlated. If so, fitting an empty model controlling for the covariance structure is necessary. I usually liken this to allowing a person to vote twice or more. In the non-descriptive - but inferential setting, not controlling for it would make it seem like that person's attributes would be more associated with the outcome.
 
#4
Thank you for you answer!

I am identifying types of political participants (using latent class analysis) from a dataset containing 30+ countries. I am not using the multilevel option in Latent GOLD, but end my analysis in Latent GOLD with assigning each person to a cluster (modal classification). Then I am planning on using MLwiN to introduce macro-effects on being assigned to a certain class ("which variables help explain why some participant types are more common in certain countries?").

So I guess the LCA is purely descriptive, no covariates introduced yet. Is it a problem if I do the LCA on a single-level then, ignoring the fact that respondents are nested in countries?
 

hlsmith

Less is more. Stay pure. Stay poor.
#5
Your context is different from mine. Mine will usually be medical patients in the dataset more than once (so obs clustered in patients) or patients clustered in hospitals.

Controlling for it, allows you to control for the between and within group variability and define if there is a difference between these. Ignoring it, mean you are neglecting to include it and your standard errors will be narrower and you could have a type I error.

You can do what you deem appropriate, you just need to be very transparent, so that your audience understands how the data are generated and what results represent.