how to use characters when calling values for a variable?

#1
Hi,

I have a database with a variable "stage" with the following values: 1, 2, 1A, 1B, 2A, 2B.

I am trying to trim my data by finding all the people where the variable "stage" = 1B and delete them from the dataset.

I tried using this code:

data = allpeople_no1B;
set library.allpeople;
if stage = 1B then delete;
run;


However, SAS won't recognize 1B as a value of the variable stage even though it is in the dataset. I think it has something to do with the fact that there is letter ("B") because when I run the same code with 1, like this:

data = allpeople_no1B;
set library.allpeople;
if stage = 1 then delete;
run;

it works fine.

How can I get SAS to recognize "1B"?

thanks,
kungfupanda
 
#2
It seems that “stage” is a variable of type text, not numeric. In that case, your code should be (note the use of inverted commas):
Code:
data = allpeople_no1B;
	set library.allpeople;
	where (upcase(compress(stage)) ~= "1B"); 
run;
(Note: “if (upcase(compress(stage)) = "1B") then delete;” and “if (upcase(compress(stage)) ~= "1B");” will also work; however, the “where (upcase(compress(stage)) ~= "1B");” construct is more efficient because it means that SAS won’t populate the entire data vector for those observations that don’t meet the “where” condition. The difference becomes more noticeable as the dataset size increases.)
 
#3
Thank-you so much! The code

“if (upcase(compress(stage)) = "1B") then delete;”

actually worked except it deleted everything except for 1B, whereas I wanted to delete 1B- is there any way to alter the code for this?

Alternatively, is there some way to add multiples to keep, for example, delete everything except for 1A + 1C?

I tried “if (upcase(compress(stage)) = "1B", "1C") then delete;” but that failed.

thanks again!
 
#4
What you’re saying about “if (upcase(compress(stage)) = "1B") then delete;” keeping only “1B” values is contradictory and makes no sense to me. Are you sure you didn’t have it as “if (upcase(compress(stage)) ~= "1B") then delete;” instead? (The tilde character “~” is a logical not operation in SAS.)

For the reason already stated, it’s good practice to use a “where” clause when sub-setting data in SAS.

To exclude multiple values of “stage” use “where (~(upcase(compress(stage)) in ("1B", "1C")));”.

To keep multiple values of “stage” use “where (upcase(compress(stage)) in ("1B", "1C"));”.