Best laid plans...

#1
My thesis design was to give participants 2 tests. One measures a personal "type" and the other asks "opinions".

My manipulation was to reverse the order of test presentation. (i.e. group 1, "type" --> "opinion"; group 2, "opinion" -->"type")

The dependent variable is the score on "opinion" (scores range from 8-56).

The independent variable was the order in which they received the test. (1,2)

A 2x2 ANOVA or t-test comparing the means on "opinions" for main effect of test order does the trick so far, but...

The "type" test gave subjects scores in 3 different "types". My intention was to use these scores to divide the subjects into three different groups (i.e. "type" A, B, C) which would become my quasi-independent variable.

My plan was to run a 3 (type) x 2 (test order) ANOVA.

Unfortunately, when I got my data, "type" did not clearly split into three groups. Some scored low in all three, others high in two or three types, etc. My ANOVA sprouted into a messy patchwork of ad hoc group assignments and became desperately unworkable.

What I want to do is a statistical analysis that will allow me to look at the 3 "type" variables as continuous variables (the mean score in each type) and consider their statistical weight, and also take into account test order as it effects my "opinion" score DV.

i.e. Subject #1 - "type" A = 1, B = 4, C = 7. This would weight the score of C (7) heavier than B (4) and both C & B as heavier than A (1), thereby giving this subject a C "tendency." So, the question becomes "how well does this C, B, A weighted tendency, taken together with receiving the "opinion" test first (or second), predict his score on the "opinion" test?"

I first thought multiple regression, which I could see working for how "type" effects "opinion" score, but I don't see how test order could be considered in a multiple regression analysis (which is, after all, the real manipulation, and crux of the study). Could I somehow use a "dummy variable" to code test order?

I also thought maybe logistic regression, to maybe see how "type" and "opinion" test score predicted test order (i.e. 1 or 2), but...that seems backasswards, and frankly I don't understand LR well enough to be sure.

I'm baffled and my adviser is stumped. Trying to figure this out using my stats book feels like I'm reliving The Davinci Code. ANY input or suggestion would be cherished forever and ever by all of humanity, no hyperbole intended.

And thanks for reading this incredibly long and boring post.
 

CB

Super Moderator
#2
I first thought multiple regression, which I could see working for how "type" effects "opinion" score, but I don't see how test order could be considered in a multiple regression analysis (which is, after all, the real manipulation, and crux of the study). Could I somehow use a "dummy variable" to code test order?
Hi there! You could absolutely use a dummy variable for this - just code one order as zero and the other order as one, and you're away (not 1,2 though, unless you're using SPSS and have specified test order as a nominal variable, in which case that'd be fine).

You'd end up with a single coefficient for "order" - most programs will let you choose which level of the dummy variable you'd like to have as a "reference category" - i.e. you get a coefficient for the influence of condition: type -> opinion, or a coefficient for opinion -> type, but not both, since the second coefficient would be redundant.

Multiple regression was originally designed for continuous predictors, but using dummy variables is perfectly acceptable and very commonly done. ANOVA is (I believe) in fact just a special case of regression that developed in the pre-computer days (ANOVA is much easier to do by hand, apparently).

Do make sure that you can justify the assumptions of multiple regression, though.

PS. A good statistics textbook is much more entertaining than the DaVinci code :D
 
#4
Thanks so much for your previous help; I've followed the above suggestions and all went swimmingly.

What I found was basically 0 effect for test order, like, nothing (.001 of variance accounted for by adding that variable). This struck me as odd because my t-tests for "type" moved in opposite directions, indicating at least some interaction. (type A scores went up, type B went down, and type C stayed the same.)

So I'm wondering if the variables moving in opposite directions is negating the multiple regression's consideration of test order. In other words, if the mean for type A moves from 3 -> 4, and the mean for type B moves from 4 ->3 (and C 3 -> 3), would that "cancel out" the effect for test order because adding that variable to the multiple regression analysis (as the 0,1 dummy variable) considers the effect across all 3 variables? (thereby adding nothing to the predictive ability of the model?)

If the above is accurate, is there a way to analyze the crossing effect of A tendency and B tendency for significance while still "weighting" the types using continuous scores?

I'm starting to think about logistic regression again, and that scares me, deeply.
 

CB

Super Moderator
#5
What I found was basically 0 effect for test order, like, nothing (.001 of variance accounted for by adding that variable). This struck me as odd because my t-tests for "type" moved in opposite directions, indicating at least some interaction. (type A scores went up, type B went down, and type C stayed the same.)

So I'm wondering if the variables moving in opposite directions is negating the multiple regression's consideration of test order. In other words, if the mean for type A moves from 3 -> 4, and the mean for type B moves from 4 ->3 (and C 3 -> 3), would that "cancel out" the effect for test order because adding that variable to the multiple regression analysis (as the 0,1 dummy variable) considers the effect across all 3 variables? (thereby adding nothing to the predictive ability of the model?)

If the above is accurate, is there a way to analyze the crossing effect of A tendency and B tendency for significance while still "weighting" the types using continuous scores?
It sounds to me like you're talking about an interaction effect - the effect of "order" depending on the level of the various personality types. How about trying adding interaction terms to the model? (order x A, order x B, order x C?)

I'm starting to think about logistic regression again, and that scares me, deeply.
Haha! Logistic regression can be your friend, it really isn't that scary at all :p
 
#6
It sounds to me like you're talking about an interaction effect - the effect of "order" depending on the level of the various personality types. How about trying adding interaction terms to the model? (order x A, order x B, order x C?)



Haha! Logistic regression can be your friend, it really isn't that scary at all :p
That intrigues me. I don't remember seeing anything about that in my stats book, though...

Would that be accomplished by, in SPSS, doing Analyze -> regression -> Linear and then putting in A and order, then NEXT, then B and order, then NEXT then C and order?

And do you think Logistic regression would be appropriate?
 

CB

Super Moderator
#7
Would that be accomplished by, in SPSS, doing Analyze -> regression -> Linear and then putting in A and order, then NEXT, then B and order, then NEXT then C and order?
Ah, not quite. The 'next' function is used to allow you to specify the order entry of of 'blocks' of variables into the equation - but the variables within the blocks remain separate in the equation.

Unfortunately I don't have SPSS at home and I'm not 100% sure on the easiest way to specify interactions off the top of my head... but if there isn't an obvious function within the regression dialogs, one easy-ish way would be to create the interaction terms yourself: Transform > Compute variable, and create a new variable "order*A" (order multiplied by type A score) - and the same for the other personality types. Then enter the interaction terms in a new regression analysis.

And do you think Logistic regression would be appropriate?
I think linear regression is a better conceptual match for what you're trying to do right now - maybe you can become pals with logistic reg some other day!
 
#8
How to calibrate odds in credit scoring model

Dear all,

I have developed score card for a financial institute, the objective was to forecast future performance from past behaviour. Credit scoring is a process whereby information provided is converted into numbers that are added together to arrive at a score.
First calculating odds population was then segmented and few scorecards were developed in order to score every one according to their characteristics.
Through multivariate statistical methods predictable attributes were selected and added in every score card. In this way the whole population was scored. Now I need to calibrate my scores, banding or scaling to ensure score results have same meaning across all score cards/to give reading in approperiate units and to provide default probabilities.
e.g.

Calibrated Score Calibrated Odds Default Rate
100 1:1 50%
200 2:1 33%
300 4:1 20%
--
800 128:1 0.8%
900 256:1 0.4%

where 1:1, odd means in every 2 people 1 can be a defaulter.

I am not sure how to calibrate/ standardized my scores. Can anyone please help me in it? Any input or suggestion would be highly appreciated.

Thanks,
Saira
 
#9
How to calibrate odds in credit scoring model

Dear all,

I have developed score card for a financial institute, the objective was to forecast future performance from past behaviour. Credit scoring is a process whereby information provided is converted into numbers that are added together to arrive at a score.
First calculating odds population was then segmented and few scorecards were developed in order to score every one according to their characteristics.
Through multivariate statistical methods predictable attributes were selected and added in every score card. In this way the whole population was scored. Now I need to calibrate my scores, banding or scaling to ensure score results have same meaning across all score cards/to give reading in approperiate units and to provide default probabilities.
e.g.

Calibrated Score Calibrated Odds Default Rate
100 1:1 50%
200 2:1 33%
300 4:1 20%
--
800 128:1 0.8%
900 256:1 0.4%

where 1:1, odd means in every 2 people 1 can be a defaulter.

I am not sure how to calibrate/ standardized my scores. Can anyone please help me in it? Any input or suggestion would be highly appreciated.

Thanks,
Saira
 
#10
Ah, not quite. The 'next' function is used to allow you to specify the order entry of of 'blocks' of variables into the equation - but the variables within the blocks remain separate in the equation.

Unfortunately I don't have SPSS at home and I'm not 100% sure on the easiest way to specify interactions off the top of my head... but if there isn't an obvious function within the regression dialogs, one easy-ish way would be to create the interaction terms yourself: Transform > Compute variable, and create a new variable "order*A" (order multiplied by type A score) - and the same for the other personality types. Then enter the interaction terms in a new regression analysis.



I think linear regression is a better conceptual match for what you're trying to do right now - maybe you can become pals with logistic reg some other day!

Ok, I think I'm with you. You're saying multiply the scores of the people (in the seperate catagories, i.e. A(type)1(order)xA(type)2(order), B1 x B2, and C1 x C2), make those results a variable, and then enter it into the regression analysis, right?
:yup:

At the risk of appearing obtuse...

I have an N = 51 for order1, and an N = 44 for order2. That leaves 7 order1's with nothing to multiply them against...

Would it be ok to multiple those 7 by the average score on Order2 ?
 
Last edited:

CB

Super Moderator
#11
Ok, I think I'm with you. You're saying multiply the scores of the people (in the seperate catagories, i.e. A(type)1(order)xA(type)2(order), B1 x B2, and C1 x C2), make those results a variable, and then enter it into the regression analysis, right?
:yup:

At the risk of appearing obtuse...

I have an N = 51 for order1, and an N = 44 for order2. That leaves 7 order1's with nothing to multiply them against...

Would it be ok to multiple those 7 by the average score on Order2 ?
Heyhey! That wasn't quite what I had in mind - what I think you're looking for is this:

First of all, make absolutely sure you're got order dummy-coded as [0, 1], not [1, 2] or anything else - that won't fly for this bit! Then...

Interaction of Type A score with order = Type A score x order [either 0 or 1]
Interaction of Type B score with order = Type B score x order [either 0 or 1]
Interaction of Type C score with order = Type C score x order [either 0 or 1]

What you'll find, of course, is that the interaction terms are the "type" score for those with order=1, and zero for the others. You shouldn't need to worry about the N for the two different orders being different. You can then enter the interaction terms into the regression equation.

What I'm thinking though as I'm typing away is that the interaction effect may differ depending on which way round your dummy-coding is (i.e. which order is specified as 1, and which as 0), since all the interaction variability will be tied up in the cases with order "1". It might be worth trying and switching the order to see what happens, but also I'd suggest seeing if you can find some references regarding assessing interactions when one of the variables in the interaction is dummy-coded - there might be some cautions about or problems using this approach, and I'd hate to have blithely led you down the garden path!

Collinearity between the interaction scores and the main effect type scores could be another possible issue, given that the type score and interaction terms will be the same for all cases with order = 1... hmm, if anyone else wants to weigh in on this that would be really great (for me too!)
 
#12
For prosperity, and those curious onlookers...

I talked to an economics professor who specializes in multiple regression and here's the plan (which I haven't implemented yet):

I'm going to run a multiple regression seperately for each test order, and dummy code for type. Then I'm planning on running FTests to compare the results and look for significance.

I dunno, we'll see.
 
#13
For prosperity, and those curious onlookers...

I talked to an economics professor who specializes in multiple regression and here's the plan (which I haven't implemented yet):

I'm going to run a multiple regression seperately for each test order, and dummy code for type. Then I'm planning on running FTests to compare the results and look for significance.

I dunno, we'll see.
nope, the above was incorrect...


However I think this may be right:

opinion = b0 + b1(order)(typeA) + b2(order)(typeB) + b3(order)(typeC) + b4(typeA) + b5 (typeB) + b6(typeC) + b7,8,9,etc(any other variables)

(order = 0 or 1 coded as dummy variables)

It should be asking "how much variance is accounted for by the effect of order, depending on the type, on the dependent variable (the opinion score in this case)