Multi-variate regression for a small sample

I am not a statistician and I am only a beginner in this field.
I would really appreciate any help on this subject of regression.

I have a sample size of 37 with 9 predictors.

The predictors are (family size(categorical so its converted to dummy variable), total no of appliances (scale ), total no of rooms(scale ), total appliance usage hours(scale ), tarriff price of electricity(scale ), income group(categorical so its converted to dummy variable) etc)

The DV is energy consumption (scale variable and normally distributed after log transformation)
I want to know which of these variables actually impacts the consumption using step-wise regression
I know this is a terrible sample size,are there any tests that I need to do to make sure I am not finding false positives -I would greatly appreciate any reommendations or or any tests I need to run or if there are sources I can refer as a beginner
The post-hoc power analyses finds H1p^2=0.31 and H0p^2=0 with power (1-beta)=0.79

I want to know if this is terrible way to approach this or should I just stick with descriptive statistics and finding correlations with these variables
Last edited:


No cake for spunky
37 is a really small sample. Particularly for 9 variables.

Step wise regression is pretty much universally frowned on these days. The threads I did when I was learning Lasso this week show a much better approach (Lasso). They are not much harder than stepwise. You don't, in any case, use stepwise to predict. You use it to throw variables out of your model. While that is useful with so few cases, you want to end up using linear regression not stepwise regression. This is true with lasso, although I am not sure it will work with so few cases. Can you gather more data?

Power is a good test to start with. Running the residuals, in linear regression, and testing the assumptions of regression also is a good idea. If the residuals are bad with so few cases you might want to rethink your method particularly your statistical tests.

Which software are you using?
Thank you for the response, unfortunately, that is the data I am stuck with.
The residiuals seem alright (lie between -2 and 2.1) with no patterns
I used SPSS, I also thought of using generalised linear model with logit link as all my predictiors are not normal (even though that is not required) and it seems like regression requires several conditions prior to analysis especially with small sample sizes
Last edited:


No cake for spunky
I have solutions for SAS (and you can do them for R) but I have not used SPSS in many years.

You want to look for non-linearity and heteroskedascity in your data by looking at the residual plots not a test. I am not sure what you mean by lie between ....

See if SPSS does LASSO. If it does that is likely the way you want to go.


Less is more. Stay pure. Stay poor.
n=39 is too nominally small to likely generalize, correct? What is the target population and how big is it? Select descriptive stats are likely your go to here. Post-hoc power analyses are incorrect - after results have been seen or results influence approach and variable selection.

Thanks and welcome to the forum.
Thank you all,
Do you think LASSO might work?
Should I standarise all predictors?
are they tests like power analysis to observe beforehand?

This dataset is the only data I have unfortunately
I wanted to extract a methodology and understand the initial findings but I have been getting frowns with this dataset and regression.I think going case by case or descripitve maybe better then
Last edited: