# [Excel] - How to assign numerical values to text data on Excel

#### StatsClue

##### Member
Column: Race- has Blacks, White, Hispanics, Asians etc.

In the next column I need to give these individual races numerical values, for eg. Black is 1, White is 2 and so on.

How to do it?

Thanks.

#### SiBorg

##### New Member
Lets say you have column A with the race and you want to put the numerical value into column B

Somewhere else on your spreadsheet (doesn't matter where) let's say, column E and F you can list the name in column E and the numerical equivalent in column F

So, e.g.

E F

Black 1
Caucasian 2
Hispanic 3
Asian 4

Or whatever you want.

You then put the following formula into column B

=VLOOKUP(A1,$E$1:$F$4,2)

This formula looks up the value from cell A1 in the array E1:F4 and returns the value from the second column in E1:F4

The \$ are important so that when you drag the copy handle and copy the formula to the other cells the A1 reference changes to A2, A3, etc, but the E1:F4 stays constant.

Hope this helps.

#### Dason

Also just out of curiosity - why do you want to convert everything to a number?

#### StatsClue

##### Member
Thanks SiBorg. I figured out another way to do it using the 'IF' function. Thanks for trying to help.

Dason- I think a regression analysis on SAS necessarily requires you to have numerical values- so no other reason .

#### Dason

Dason- I think a regression analysis on SAS necessarily requires you to have numerical values- so no other reason .
SAS will do the conversion for you. You just tell SAS that your variable is categorical and it does the rest for you. Typically the software you're using will do the dummy coding for you.

Plus just doing a direct coding into numbers isn't the right thing to do. See this thread at the stats stackexchange site for explanations why: http://stats.stackexchange.com/questions/20166/how-to-implement-dummy-variable-using-n-1-variables

#### StatsClue

##### Member
SAS will do the conversion for you. You just tell SAS that your variable is categorical and it does the rest for you. Typically the software you're using will do the dummy coding for you.

Plus just doing a direct coding into numbers isn't the right thing to do. See this thread at the stats stackexchange site for explanations why: http://stats.stackexchange.com/questions/20166/how-to-implement-dummy-variable-using-n-1-variables
Oh, thanks for bringing this to attention. I knew vaguely about dummy variables but when SAS refused to perform my [prog reg, model outcome=predictor, run] I thought numerical codes would solve the problem (and dummy variables seemed like a complicated procedure). So you're saying the results using numerical codes vs. dummy variables would be different?

When is coding acceptable? Is it okay to code when there are only two values, for example, yes/no, red/blue, male/female ? Asking 'cause I've done it already! Crap.

#### Mean Joe

##### TS Contributor
When SAS refused to work with Black/White/etc, you needed to put a CLASS statement in the PROC, eg
Code:
proc reg;
class Race;
model ...;
You can keep 1/2/3/4/etc for the coded Race, but put in a CLASS statement again. Then you will get what you want: compare 2 vs 1 (eg Black vs White), 3 vs 1 (eg Asian vs White), 4 vs 1 etc.
If you do not put the CLASS statement, then you are implying that there is a consistent trend in races and as the race value increases (2 to 3 to 4) the trend continues (linearly). Obviously not what you want to do with races.

#### StatsClue

##### Member
When SAS refused to work with Black/White/etc, you needed to put a CLASS statement in the PROC, eg
Code:
proc reg;
class Race;
model ...;
You can keep 1/2/3/4/etc for the coded Race, but put in a CLASS statement again. Then you will get what you want: compare 2 vs 1 (eg Black vs White), 3 vs 1 (eg Asian vs White), 4 vs 1 etc.
If you do not put the CLASS statement, then you are implying that there is a consistent trend in races and as the race value increases (2 to 3 to 4) the trend continues (linearly). Obviously not what you want to do with races.
Thanks.

The class statement doesn't seem to be working with proc reg data=xyz; . The moment I put class underneath this, it shows up in red and SAS doesn't run eventually. I'd put : class racecode (ref='0')/param=ref; . Didn't work.

#### Dason

Does proc reg take a class option? I don't think it does. You should probably use proc glm for this kind of case.

#### StatsClue

##### Member
Does proc reg take a class option? I don't think it does. You should probably use proc glm for this kind of case.
Right, it doesn't.

I tried creating dummy variables for race. Should the comparison dummy variable always be coded '0'? Thanks.

#### StatsClue

##### Member
SAS will do the conversion for you. You just tell SAS that your variable is categorical and it does the rest for you. Typically the software you're using will do the dummy coding for you.

Plus just doing a direct coding into numbers isn't the right thing to do. See this thread at the stats stackexchange site for explanations why: http://stats.stackexchange.com/questions/20166/how-to-implement-dummy-variable-using-n-1-variables
Oh and is direct coding alright if the variable has only two possibilities: eg Gender, type of drug used (eg. either X or Y) etc.? In this case, would the following be right:
proc reg data=mydata;
model outcome=gender;
run;

?
(coded gender 1 and 2)

Also, if the outcome is dose and the gender-dose association comes out to be significant, how do you tell from the SAS output which gender gets the higher dose?
Thanks.

Last edited:

#### Dason

Coding gender as 1/2 isn't a a very good idea. Typical coding schemes usually either code it as 0/1 or -1/1. I prefer 0/1 in which case the parameter associated with gender would tell you the estimate mean difference between the genders. But you probably would be better off using proc glm and letting it take care of the coding for you.

#### StatsClue

##### Member
Coding gender as 1/2 isn't a a very good idea. Typical coding schemes usually either code it as 0/1 or -1/1. I prefer 0/1 in which case the parameter associated with gender would tell you the estimate mean difference between the genders. But you probably would be better off using proc glm and letting it take care of the coding for you.
Thanks. You mean the parameter estimate associated with gender would tell the mean difference between genders NOT if the coding is 1/2? Also, any way to tell which gender is associated with a higher dose?

Also my previous question- must the comparison variable for dummy variables always be coded 'o'? I mean, should it always be 'else 0'?

Stats not my field so at sea. Pardon the naive questions. Thanks.

#### Dason

Lets say you code males 0 and females 1. Then let's say the parameter associated with gender was estimated to be 2. That would mean that you would estimate that the average ... whatever your dependent variable is... is 2 units higher on average for females than it is for males.

#### SiBorg

##### New Member
Out of interest, say you were performing logistic regression with SPSS rather than SAS and had the races coded 1-4. Would specifying the data type as 'nominal' rather than 'ordinal' create the desired effect of removing the linear relationship between the coded values?

#### StatsClue

##### Member
Also, when the variable has multiple levels (eg. race) is coding 0,1,2,3 preferable to 1,2,3,4 or is it immaterial? Thanks.

#### noetsi

##### Fortran must die
FYI if you use Enterprise Guide for logistic regression remember than it changed a lot in 4.2. This includes the class statement but also you can no maximize 1 (as most software always did) rather than 0 as SAS traditionally did.

I am sure this is true in the actual coding (well it has to be) but I try to avoid such

#### taylor

##### New Member
Hi I came across this forum incidentally. Probably you are interested in functions of excel like me. This is the reason why I would like to share the following with you. <a title="http://www.excel-aid.com/excel-numeric-custom-autofilters-for-numeric-columns.html" href="http://www.excel-aid.com/excel-numeric-custom-autofilters-for-numeric-columns.html">http://www.excel-aid.com/excel-numeric-custom-autofilters-for-numeric-columns.html</a> Especially the video helped me.

#### andrem07

##### New Member
Column: Race- has Blacks, White, Hispanics, Asians etc.

In the next column I need to give these individual races numerical values, for eg. Black is 1, White is 2 and so on.

How to do it?

Thanks.
In case you have a long list of text data in excel (e.g. names of states, countries etc.) and you want to import this to SAS or STATA, you first need to assign a numerical value to entries in the list. This can be easily achieved using the "IF" function in Excel. I will use the Penn World Table to illustrate:

Procedure:

1) Insert an extra column.

2) In the Penn World Table, the first country is Angola (country iso code AGO) and assuming that you have 41 year observations for each country ranging from 1960 to 2000 - the observations for Angola will run from row 2 to row 42. The next country is Albania (country iso code ALB) and its observations run from row 43- row 83, etc. (observations start from the second row since the first row contains variable names)

3) In the new column, put a 1 in the second row (corresponding to the first observation for Angola). Then, use the "IF" function to specify that all remaining entries for Angola be coded "1", Albania "2", etc. Assuming that the iso_code identifier is in column B in Excel, the following formula should do it

"=IF(B3= B2,C2,C2+1)"

Recall that we put a 1 in the second row of the new column ( "C2" in this case). So in cell C3, we have instructed Excel to assign value 1 if the country identifier in column B3 is the same as the country identifier in column B2, ELSE - assign a value of C2+1 = 2.

4) Drag down this formula until the bottom and each country will now have a unique numerical identifier.

Andrew Musau
Faculty of Economics and Social Sciences
University of Agder