# Multiple Regression

#### student865

##### New Member
If I have 2 categorical variables, how do I assign dummy variables to them? I want to assign 0 and 1 to one categorical variable...can I assign 0 and 1 to another categorical variable?

#### Dason

How many categories are in each variable?

#### trinker

##### ggplot2orBust
Yes but you need to create a new variable (kinda) or column for each group in the original categorical variable. You make J-1 recoded columns where J is the number of groups.

Here's an example:
THIS IS WHAT THE DATA FRAME LOOKS LIKE AND WE WANT TO RECODE RACE AND GENDER
Code:
    race gender        score
1  black female  0.865168583
2  black female  0.725432147
3  asian female  0.881808984
4  asian   male  1.817372208
5  asian   male  1.109067710
6  black female -0.100645436
7  asian female -0.291707834
8  white   male  0.009416695
9  asian female -0.691128384
10 white female  0.036868814
NOTICE RACE WAS RECODED AS ASIAN AND BLACK. I COULD HAVE DONE ASIAN & WHITE OR WHITE & BLACK. IT DOESN'T MATTER BUT WHAT EVER GTOUP YOU DON'T INCLUDE WILL BE YOUR COMPARISON (THE Y INTERCEPT) IN THE REGRESSION COEFFICIENTS TABLE.
Code:
    race gender        score race.asian race.black gender.female
1  black female  0.865168583          0          1             1
2  black female  0.725432147          0          1             1
3  asian female  0.881808984          1          0             1
4  asian   male  1.817372208          1          0             0
5  asian   male  1.109067710          1          0             0
6  black female -0.100645436          0          1             1
7  asian female -0.291707834          1          0             1
8  white   male  0.009416695          0          0             0
9  asian female -0.691128384          1          0             1
10 white female  0.036868814          0          0             1
THIS IS HOW IT WOULD LOOK IF YOU RECODED EVERY GROUP IN THE VARIABLE BUT DOING SO WOULD BE REDUNADANT AND THE COMPUTER PROGRAM WOULD EITHER NOT ACCEPT THE FINAL GROUP OR WOULD AUTOMATICALLY IGNORE IT.
Code:
    race gender        score race.asian race.black race.white gender.female gender.male
1  black female  0.865168583          0          1          0             1           0
2  black female  0.725432147          0          1          0             1           0
3  asian female  0.881808984          1          0          0             1           0
4  asian   male  1.817372208          1          0          0             0           1
5  asian   male  1.109067710          1          0          0             0           1
6  black female -0.100645436          0          1          0             1           0
7  asian female -0.291707834          1          0          0             1           0
8  white   male  0.009416695          0          0          1             0           1
9  asian female -0.691128384          1          0          0             1           0
10 white female  0.036868814          0          0          1             1           0
Think of dummy coding as simple binary yes no. A 0 is no and a 1 is yes. By recoding J-1 groups we've covered every scenerio. So if an observation was 0 for black and 0 for white, by default they'd be asian. In a way these columns are new variables but they're really not. it's a way of 'tricking' the regression model into accepting non numeric responses. The entire J-1 set of variables is entered into the model as a block (all at once).

#### student865

##### New Member
Ok I am still confused.

I have 2 categorical variables. In the text it says to give a categorical independent variable a value of 0 and 1.
But in my situation I have 2 independent variables with 2 categories. Shelf location(End Aisle, Normal) and Dispenser (Yes,No). I already label Shelf Location with End Aisle=1 and Normal=0. What do I label Dispenser(Yes and No). Can I label Dispenser with Yes=1 and No=0 or something else?

I am using Excel, so I do not understand SPSS. I need your help asap!!!

#### Dason

Yes. Recoding both of them using a 1/0 coding will work fine.

#### student865

##### New Member
so if I already label Shelf Location with End Aisle=1 and Normal=0 and Dispenser with Yes=0 and No=1, this is ok on Excel?

#### Dason

Assuming you're doing everything else correct then yes.

Ok great thanks.

Last edited:

#### trinker

##### ggplot2orBust
Did you check for interaction between shelf location and dispencer? This could be very important especially if you're actually advising a company.

#### student865

##### New Member
sales with shelf location *dispenser >>y intercept(sales) =2,519 slope= -23.99 t-stat =-0.02 p-value=0.9774

Sales and dispenser>>y intercept(sales) = 3,234 slope=-274 tstat=-0.62 p-value=0.53

Sales and shelf location( I took out dispenser)>>y intercept(sales) = 2,696 slope=911.31 tstat=2.22 p-value=0.0335

So just having shelf location would be better. But how do I recommend specific shelf location?

#### noetsi

##### No cake for spunky
Where in your model did you test specific location?

#### student865

##### New Member
I conducted a simple linear regression of sales of energy bars and shelf location
Sales and shelf location( I took out dispenser)>>y intercept(sales) = 2,696 slope=911.31 tstat=2.22 p-value=0.0335

In the table I am given rows of sales data along with if they are end aisle or normal

So do you mean I cut and paste the shelf location data (where it says end aisle or Normal), and re-create 2 independent variables which is End Aisle and Normal with dependent variable-sales of energy bars...and run a multiple regression analysis?

#### noetsi

##### No cake for spunky
I meant I did not see the variable that involved shelf location.

Could you state your model as Y variable <----- Intercept + IV1 + IVII..... with IV being each independent variable?

I would make it a lot easier to comment on.

#### student865

##### New Member
Hi do you have a person e-mail address? I have an excel file I wanted to send you that has a multiiple regression analysis on it. I wanted to see if you can help me answer a question on it. It would be easier for you to help me if you can see my work.

#### student865

##### New Member
regression equation for interaction of shelf location and dispenser for sales(dependent variable):
2,519 +942(shelf location) +336(dispenser) -23.99(interaction of location and dispenser)

Again this is based of sample of 14 stores. I am given the raw data on the 34 stores and the sales,shelf location(aisle,normal), and presence of coupon dispenser(yes,no).

So I do not know how to answer the question of how would I advise and recommend using a SPECIFIC shelf location and IN-STORE COUPON DISPENSER TO SELL BARS.

#### student865

##### New Member
And I meant this is based off sample of 34 stores not 14

#### noetsi

##### No cake for spunky
I do have an e-mail but all my computers belong to the state so I could not run your data.

I don't think from that you could reccomend anything. First, because 14 locations is too little data likely and second because no where in that model does it provide any data on specific shelf location. Only if shelf location is important or not. I would run an ANOVA (if you have specific shelf locations) and then do a Tukey HSD. If there are specific shelf locations it will tell you which works best. Doing this with many dummy variables in regression is a lot harder.