Multiple Regression

#1
If I have 2 categorical variables, how do I assign dummy variables to them? I want to assign 0 and 1 to one categorical variable...can I assign 0 and 1 to another categorical variable?
 

trinker

ggplot2orBust
#3
Yes but you need to create a new variable (kinda) or column for each group in the original categorical variable. You make J-1 recoded columns where J is the number of groups.

Here's an example:
THIS IS WHAT THE DATA FRAME LOOKS LIKE AND WE WANT TO RECODE RACE AND GENDER
Code:
    race gender        score
1  black female  0.865168583
2  black female  0.725432147
3  asian female  0.881808984
4  asian   male  1.817372208
5  asian   male  1.109067710
6  black female -0.100645436
7  asian female -0.291707834
8  white   male  0.009416695
9  asian female -0.691128384
10 white female  0.036868814
NOTICE RACE WAS RECODED AS ASIAN AND BLACK. I COULD HAVE DONE ASIAN & WHITE OR WHITE & BLACK. IT DOESN'T MATTER BUT WHAT EVER GTOUP YOU DON'T INCLUDE WILL BE YOUR COMPARISON (THE Y INTERCEPT) IN THE REGRESSION COEFFICIENTS TABLE.
Code:
    race gender        score race.asian race.black gender.female
1  black female  0.865168583          0          1             1
2  black female  0.725432147          0          1             1
3  asian female  0.881808984          1          0             1
4  asian   male  1.817372208          1          0             0
5  asian   male  1.109067710          1          0             0
6  black female -0.100645436          0          1             1
7  asian female -0.291707834          1          0             1
8  white   male  0.009416695          0          0             0
9  asian female -0.691128384          1          0             1
10 white female  0.036868814          0          0             1
THIS IS HOW IT WOULD LOOK IF YOU RECODED EVERY GROUP IN THE VARIABLE BUT DOING SO WOULD BE REDUNADANT AND THE COMPUTER PROGRAM WOULD EITHER NOT ACCEPT THE FINAL GROUP OR WOULD AUTOMATICALLY IGNORE IT.
Code:
    race gender        score race.asian race.black race.white gender.female gender.male
1  black female  0.865168583          0          1          0             1           0
2  black female  0.725432147          0          1          0             1           0
3  asian female  0.881808984          1          0          0             1           0
4  asian   male  1.817372208          1          0          0             0           1
5  asian   male  1.109067710          1          0          0             0           1
6  black female -0.100645436          0          1          0             1           0
7  asian female -0.291707834          1          0          0             1           0
8  white   male  0.009416695          0          0          1             0           1
9  asian female -0.691128384          1          0          0             1           0
10 white female  0.036868814          0          0          1             1           0
Think of dummy coding as simple binary yes no. A 0 is no and a 1 is yes. By recoding J-1 groups we've covered every scenerio. So if an observation was 0 for black and 0 for white, by default they'd be asian. In a way these columns are new variables but they're really not. it's a way of 'tricking' the regression model into accepting non numeric responses. The entire J-1 set of variables is entered into the model as a block (all at once).
 
#4
Ok I am still confused.

I have 2 categorical variables. In the text it says to give a categorical independent variable a value of 0 and 1.
But in my situation I have 2 independent variables with 2 categories. Shelf location(End Aisle, Normal) and Dispenser (Yes,No). I already label Shelf Location with End Aisle=1 and Normal=0. What do I label Dispenser(Yes and No). Can I label Dispenser with Yes=1 and No=0 or something else?

I am using Excel, so I do not understand SPSS. I need your help asap!!!
 

trinker

ggplot2orBust
#9
Did you check for interaction between shelf location and dispencer? This could be very important especially if you're actually advising a company.
 
#10
sales with shelf location *dispenser >>y intercept(sales) =2,519 slope= -23.99 t-stat =-0.02 p-value=0.9774

Sales and dispenser>>y intercept(sales) = 3,234 slope=-274 tstat=-0.62 p-value=0.53


Sales and shelf location( I took out dispenser)>>y intercept(sales) = 2,696 slope=911.31 tstat=2.22 p-value=0.0335

So just having shelf location would be better. But how do I recommend specific shelf location?
 
#12
I conducted a simple linear regression of sales of energy bars and shelf location
Sales and shelf location( I took out dispenser)>>y intercept(sales) = 2,696 slope=911.31 tstat=2.22 p-value=0.0335

In the table I am given rows of sales data along with if they are end aisle or normal

So do you mean I cut and paste the shelf location data (where it says end aisle or Normal), and re-create 2 independent variables which is End Aisle and Normal with dependent variable-sales of energy bars...and run a multiple regression analysis?
 

noetsi

No cake for spunky
#13
I meant I did not see the variable that involved shelf location.

Could you state your model as Y variable <----- Intercept + IV1 + IVII..... with IV being each independent variable?

I would make it a lot easier to comment on. :)
 
#14
Hi do you have a person e-mail address? I have an excel file I wanted to send you that has a multiiple regression analysis on it. I wanted to see if you can help me answer a question on it. It would be easier for you to help me if you can see my work.
 
#15
regression equation for interaction of shelf location and dispenser for sales(dependent variable):
2,519 +942(shelf location) +336(dispenser) -23.99(interaction of location and dispenser)

Again this is based of sample of 14 stores. I am given the raw data on the 34 stores and the sales,shelf location(aisle,normal), and presence of coupon dispenser(yes,no).

So I do not know how to answer the question of how would I advise and recommend using a SPECIFIC shelf location and IN-STORE COUPON DISPENSER TO SELL BARS.
 

noetsi

No cake for spunky
#17
I do have an e-mail but all my computers belong to the state so I could not run your data.

I don't think from that you could reccomend anything. First, because 14 locations is too little data likely and second because no where in that model does it provide any data on specific shelf location. Only if shelf location is important or not. I would run an ANOVA (if you have specific shelf locations) and then do a Tukey HSD. If there are specific shelf locations it will tell you which works best. Doing this with many dummy variables in regression is a lot harder.