Sample Size Affecting my Estimates

I’m running an analysis on survey data from the National Survey on Drug Use and Health (NSDUH) using a combined set of multiple years. I limited the set down to variables of interest and created a merged set using proc append in SAS 9.4.

I’m running a surveylogistic procedure with complex sampling variables (stratification, cluster, and weight) to account for sample size in my estimates. Everything is suspiciously significant. Suspicious because when I run the model on a single year some variables prove non-significant. I’m afraid the complex sampling variables aren’t actually adjusting the sample so I decided to randomly split the sample into training and test samples for analysis. When I do this I get some non-significant covariates which feels right.

My question is am I doing something wrong with the full sample and is it inappropriate to take a train and test approach when I’m not really working with a ton of variables. I would like to use the full sample. Any help is appreciated.


Omega Contributor
Do you just have an over-powered analysis? Meaning you can find statistical significance, even when results aren't really contextually significant (e.g., small effect sizes). If you run a model on one subsample and then the same model on the other subsample are the estimates pretty much identical (so not scoring it but just looking at the effects). Next are these effects the same as when the model is applied to the full sample with the only difference being the standard errors for subsamples are larger? Traditionally, SEs are calculated as std/(sqrt(n), so the bigger the sample size the easier to show a marginally large effect as significant.

What are the weights correcting for, just up and down weighting sample characteristics to make the sample more reflective of the super population?
All of my independent variables are categorical (binary, ordinal, and nominal). If I select a model using the training subsample one of the variables, which was significant in the full sample drops out. Odds ratios vary between 1 and 80% for models applied to training and test subsample. The larger training subsample odds ratios only vary between 1 and 11% from the full sample.

The cluster, stratification, and weight variables account for the sampling methodology. This is done in national survey datasets to reduce overpowering. How do I know if the full sample is overpowered?