Reading (tricky) Data Into SAS

#1
Hello Every One,

I am new to SAS, here is the input file we have, need to get the output as listed below.

  • Need to delete ''====".
  • Need to list Male/Female under Gender Column
  • Need to list 20/30 under AgeGroup column

Thanks in Advance. :rolleyes:



Input Data




Code:
Female  20 
=================
Name	Dept
=================
Mary	IT
Jill	Sales

Male	30
=================
Name	Dept
=================
Tom	IT
John	Sales



OUT PUT


Code:
Gender	AgeGroup   	Name	Dept

Female	20		Mary	IT
Female	20		Jill	Sales
Male	30		Tom	IT
Male	30		John	Sales
 
#2
reading hierarchical text with junk lines

There could actually be a few sticky points on this, but in general it isn't too bad.

DATA newfile; RETAIN gender age; DROP RECTYPE;
*retain is neccesary to hold onto the values for gender and age;
*rectype will be used to tell SAS what to do with each row of data. you don't want to keep it after that;
INFILE 'C:\JUNK\junksasfile.txt' lrecl=231 MISSOVER;
*infile is reading the file I saved with your example data;
*lrecl sets record length long enough to take any line;
*missover is used since there are different number of variables depending on type of line otherwise you get wrapping;

INPUT RECTYPE $ @;
*reads first string up to a blank, names it rectype;
*@holds the line open until you're done with it;
IF RECTYPE IN ('Female','Male') THEN DO;
INPUT Age;
Gender=rectype;
END;
*This block reads in gender and age;
*SAS reads the variables in order... first rectype then age. You'll drop rectype, so make gender = rectype;
*the retain statement will hold these values until it encounters another line starting with "male' or 'female';

ELSE IF RECTYPE='========' THEN DO;
delete;
end;
*this block deletes lines starting with a string of at least 8 '=' signs;
*default variable length for SAS is 8;

ELSE IF RECTYPE='Name' THEN DO;
delete;
end;
*gets rid of pesky label lines;

else do;
INPUT Department $;
Name=rectype;
OUTPUT;
END;
* thefirst variable is already read in, the next variable is department;
*rectype will be equal to the first eight characters of the name, so name=rectype;
run;

This will work for the data example you gave, but there are a couple caveats. If you have various types of names, like "John Q. Public" this won't work as it assumes a blank is a delimiter. If the 8 character string isn't long enough, you'll need to set a variable length and include a decimal point after the name in the input statement to tell SAS to scan until it finds the blank (this might also include specifying the delimiter). If you have those type problems, post again. I didn't spend much time checking this, the output file isn't exactly what you want but since it is a sas dataset it doesn't matter. If you want to spit out a flat file, you can find out how to do that pretty easilly.

Brian