Multivariate Regression - I think!

I'm performing a psychological study about gender differences in attraction. Participants select between a number of figures that vary in Waist-To-Hip Ratio (WHR) or Shoulder-To-Hip Ratio (SHR). Figures varying in WHR/SHR also covary in BMI, so I ask participants to estimate the BMI of their chosen figure, and plan to factor this variance out.
Thus, my aim is to find out if, with variance due to BMI removed, the genders vary in their selection of the most attractive figure - does this sound ok up to this point?

I initially planned to use a Two-Way ANOVA, as this allows you to factor out a variable. However, I think that this can only be used for nominal variables, such as gender? And EstimatedBMI and FigureChoice are scale variables (there are 9 figures to choose between).

I then thought of multi-variate regression as another technique that allows one to isolate how much of the variance is explained by different factors. My thought is that my correlation table would consist of FigureChoice, EstimatedBMI, and Gender. How does this sound - am I on the right track?

If so, does this translate in SPSS13 into Analyze > General Linear Model > Multivariate...? If so, are FigureChoice and EstimatedBMI dependent variables and gender a Fixed Factor? Or perhaps EstimatedBMI would be a covariate?

Many thanks for your help,


TS Contributor
I would say that Figure Choice is the dependent variable, Gender is the independent variable, and Estimated BMI is a covariate.

An ANOVA would work fine here - you would have a categorical independent variable (Gender), a scale-type dependent variable (Figure Choice), and a covariate (Estimated BMI).

Just one comment/question -

I assume that the figures that vary on SHR don't vary at all on WHR (or very little), and vice-versa?

If they did, that could potentially cloud the results (i.e., for Gender = male, is it the WHR, or a linear combination of WHR + SHR?)
John, thanks very much for your prompt reply - apologies for not replying sooner - I've been thinking things through. Firstly, however, an answer to your question:

The female figures vary in waist size, and thus in WHR. Thus, they also vary in BMI, but, along the lines of your question, also in shoulder-waist ratio. Males vary in shoulder size, and thus SHR. Thus, they also vary in BMI, and shoulder-waist ratio. Are you raising this issue to point out that, like variance in BMI that I intend to include in my analysist, variance in these additional ratios might explain to some degree the variance in attractiveness? If so, this is a good point. However, given the scope of this project, I think I will have to limit my statistical analysis to the combination of BMI and Waist (female) / Shoulder (male) size, and note the lack of statistical analyses of further body shape variations as a weakness of my study.

If I may, I'd like to outline my understanding of your feedback in the context one one hypothesis: The selection of most attractive figures will not exhibit gender differences.
  • IV1: Sex
  • IV2: Nationality
  • IV3: Location of being brought up
  • IV4: Visual Condition (media/control)
  • DV1: IdealMaleBMI (estimated BMI)
  • DV2: IdealMaleFig (shape)
  • DV3: IdealFemaleBMI (estimated BMI)
  • DV4: IdealFemaleFig (shape)
I imagine then, that my first task is to assess the normality. I intend to split the file by the IVs and get histograms, and then descriptives for each group, including Kurtosis & Skewness. I will remove outliers where it seems appropriate.

Lats assume normality. My intention then is to use a Univariate GLM to compare the genders' average IdealMaleFig selections for each group (nationality, location, visual condition). To do this, I would stop using sex to split the file, and then apply sex as a fixed factor, apply IdealMaleFig as the dependent variable, and apply IdealMaleBMI as a covariate. I would then do the same for IdealFemaleFig and IdealFemaleBMI. If neither test produces significant differences, the hypothesis is supported.

This sounds cool - what do you think?
Many thanks,
Hello again, another quesiton. In my earlier post, my intention was to perform several ANCOVAs, one for each combination of the three IVs: nationality, location, and visual condition (based on a file split using these variables in SPSS).

I'm aware that multiple comparisons increases the probablity of a Type 1 error, however. Do you think this applies here? Should I instead by including these variables in the ANCOVA, and thus having four fixed factors, (when you include gender)? This makes for a messy output, with 4 main effects and 11 interactions! Are there implications of such a complex ANCOVA that I should be aware of?

Many thanks,


TS Contributor
Personally, my inclination is to keep things simple - just my opinion.

Don't worry so much about increasing Type I errors - it doesn't appear that your original hypotheses included any conjecture(s) about interactions among the IV's, but try it both ways - see what you get. The interactions may prove interesting!
Thanks very much John, I will consider both - initial indications are that the interactions are not significant - thus I guess it's not a problem to do separate tests?

I am bumbling through another hypothesis. When subjects select a figure, they also rate it for attractiveness, health, and fertility. The hypothesis is that fertility and particularly health predict attractiveness. I expect the relationship to be curvilinear (health and fecundity will increase as figure WHR increases to 0.7 (the most attractive WHR), but decrease as WHR increases further).

My approach has been to do a correlation - I graphed an overlay scatterplot with Figure-Fecundity and Figure-Health pairs, and then got a Fit Line, and compared the r^2 values. This produces what I expect/hope to find, but does this sound appropriate to you? Further, given my expectation of curvilinearity, and that the data looked roughly curvilinear, I set the Fit Line to be quadratic. Is there a test however, to know if this is appropriate or not?

Additionally, however, I'm getting confused because I have two sets of data about attractiveness. Subjects select the most attractive figure (1-9), but they also rate that figure for how attractive it is (0-100). Should I be combining these numbers somehow, and be correlating this with fecundity and health? In my correlations above, I also included Figure-Attractiveness, and found this to have the highest r^2 value.

Thanks again,


TS Contributor
Hyperstat has a good discussion of "trend analysis" which covers ways to test (it's basically a post-hoc test done after ANOVA) whether your relationship fits a polynomial / curvilinear trend:

You could also do something quickly in Excel - do an XY scatterplot, add a trend line, and specify 2nd(?) degree polynomial, and include the R^2 and best-fit equation on the chart.

On the ratings, I would just include this as a "side" discussion if it's not a central theme of your study (i.e., if you want to just focus on figure selection) - it could provide basis for a future study....I wouldn't try to "combine" them - maybe just include the correlations as an "interesting" side note.
Okay, got a bit beyond me here.

I tried to work through the equation on hyperstat, figuring the average health ratings for each figure. There are 9, so I worked though squaring the coefficients and dividing by the number of subjects who selected and rated that figure. However, I then got stuck, because I didn't know how to figure the MSE, and I wasn't quite sure why I was doing what I was doing anyway. Was the idea to calculate t, and that if t was significant, then the relationship is quadratic / curvilinear?

I'm not quite sure I understand this - how to determine whether a relationship is quadratic or linear. Some things seem to indicate that there are linear and quadratic elements to the one relationship - is there any way you can make this clearer? I'm assuming that I can't just assume the relationship is quadratic because it looks vaguely so, and because that's what I'm expecting.

Thanks for the advice regarding the attraction rating - sounds sensible. Other than the above, things seem to be going okay - 6 hypotheses are hopefully figured out, just this and one other to go - thanks again for your help.



TS Contributor
You start at linear, then do quadratic, then cubic, etc. and stop at the highest order trend that is significant.

A trend line could have significant linear and quadratic "components." If it is a general downward or upward increase - i.e., a more-or-less constant slope, then the linear component is significant. If there's a significant change in the slope, then the quadratic component is significant, and the trend would be called "quadratic" (assuming the cubic trend isn't significant).

Believe it or not, I'm still working on these problems. It looks like I am not going to be able to acheive equal sample sizes - I think instead I will have about 20 males in each visual condition, and 30 females in each visual condition. I imagine that this discrepancy is serious enough such that I must fulfill the assumptions behind ANOVA of normal distributions and equal variances between cells?

If so, how do I do I validate that I've fulfilled these assumptions? I have graphed then (histograms and boxplots), but this obviously doesn't give me any indication of statistical significance. I believe I can divide the kurtosis and skewness by std. errors, with a figure within +/-2 for both suggesting validating a normal distribution? How can I validate that the variances are not significantly different?

Are there any other assumptions for test under the GLM that I should be accounting for? I got advice from a statistician, and they started out by highlighting ceiling effects (meaning non-normality?) and the like that I hadn't picked up - is there anything beyond the mentioned assumptions I should be looking out for?

Thanks very much again,
My bad - I just read another post which explained about Shapiro-Wilks and Kolmogorov-Smirnov Tests for normality, which I found in SPSS. I assume that a significant output in these tests indicated normality and homogeneity of variances respectively?

However, if there are further things I should be looking out for, please let me know. Or is it a case of isolating where just these two assumptions aren't fulfilled, and then looking at the data to find out why (i.e. a ceiling effect)?

Many thanks,


TS Contributor
Unfortunately, the sample size issue far out-weighs any consideration of normality or homogeneity of variance.

ANOVAs are pretty robust to departures from normality or to non-homogeneity of variance, but if gender is an independent variable in the study, then 20 vs 30 is a big enough difference to be concerned about - it will confound your results, to a degree - in other words, what you see as a gender difference or interaction with another factor may really only be due to the fact that there are more females than males...

Either you need to:

(1) get 10 more males to participate, or
(2) randomly throw out 10 female data points, or
(3) work directly with a statistician who can help you with weighting the male data points* - which makes the analysis more difficult, or
(4) do separate analyses for each gender, and if you see any gender differences, you'll only be able to make general comments, and propose a further study to work them out

*if you look in SPSS Help, they may tell you how to make adjustments to the "regular" ANOVA if you have unequal sample sizes
* the following link describes some SPSS options for dealing with unequal sample sizes:
Hey, that's a great resource, thanks.

So even if the assumptions of normality and homogeneity of variance are fulfulled, the data will still be biased towards a larger sample size? I must have misunderstood my text. Is there an approximate cutoff for an acceptable ratio, i.e. 0.75 isn't close enough, but 0.80 is?

Also, am I correct in saying that a ceiling effect is a problem due to a deviation from normality? So the only things things I need to look out for are such deviations from these assumptions (for the purpose of noting, I mean, not to prevent the use of the robust ANOVA).

Thanks again,


TS Contributor
Actually, I looked back through the earlier postings in this thread, and maybe the unequal sample sizes won't be that big of an issue - gender is your only independent variable, with BMI as a covariate, so maybe it won't be a big deal. It would be a big deal if you had another indepenent variable in there...

I wouldn't worry too much about ceiling effects - I mean, the normal distribution goes to infinity in both directions - hardly anything we measure in real life does that, but the normal distribution is still a decent model...and ANOVA is robust to departures from normality...

The unequal sample sizes don't really "bias" things - it causes confounding (a situation where the effect of one variable isn't a "clean" estimate of the effect - other things or other variables are playing into it).
I do actually have another IV: a Visual Condition (control / media) :0( This stuff is really very intricate (read difficult!). There's just so much knowledge I just don't have.

I'll have to work on gleaning a few more male participants from somewhere, and just have to throw away the surplus female participants.

I understand your point regarding the confound - thanks. And your advice regarding ceiling effects.

Exploring the interaction

Ok, I think I may be being stupid here - I've done nothing but stats for the last few days, so I think the mind's starting to slow.

I have two tests with significant interaction effects but no main effects - one is a two-way ANOVA (sex x visual condition), the other a 3x2x2 repeated measures ANOVA (question (Ideal, SIdeal, OSIdeal) x sex x visual condition).

Am I right in saying that I now need to do post-hoc tests to find where the differences lie? i.e. in the two-way ANOVA, it could be that just one of the sexes changes across conditions, or both do right? I tried to do a bonferroni, (and a tukey for kicks - these were the tests I recognised), but the interface doesn't give me the option to do a test on 'question' in the repeated measures interaction, and when I try to do it on sex or visual condition in the two-way ANOVA, it says it can't because they only have two levels.

I really think I'm being a dunce here, sorry - if you could point me (again!) in the right direction, I'd appreciate it.