Agency IDs organization

#1
I would like to locate/organize my valid agency IDs (agencyid).

agencyid

CA0006
TN0106
NY0009
LA0035
TN0105
TN01o
CA0017
cA0006
La0035

The correct structure of the agency ID should be the state letters = 4 numbers ex,. (NY1111)The first thing I wanted to do is to make the first two letter of my agencyid variable upper case and maybe code as 9 (invalid ids) the ids that do not have the correct structure (NY1111) such as id TN01o. How can I do this?? thank you!

Thank you!
 
Last edited:

bukharin

RoboStataRaptor
#2
I would probably use regular expression matching for this. I would first use trim() to get rid of leading or trailing spaces, and upper() to convert the IDs to upper case. For example:
Code:
gen byte valid=regexm(upper(trim(agencyid)), "^[A-Z][A-Z][0-9][0-9][0-9][0-9]$")
See -help regexm- and here
 
#3
Thank you... I was reading about regexm.. but it is a little bit complicated. The problems is that the data was entered via a scanner. Thank you for your syntax above (it worked perfectly. But now another problem.... For example: There is an agency id NY1100 but the machine reads it as NYll00; the letter LL instead the number 11. Most of the problems are with L instead 1; and O (letter) instead 0. Remember that the agency ID structure is two Letters and 4 numbers NY1100. How can I tell stata to change L/l for 1 and 0 for O/o (letter) but only in the four last places of my string variable? I googled it but I didn't find a way to do this. I would greatly appreciate it.
 

bukharin

RoboStataRaptor
#4
This should be pretty easy for you to figure out. Have a read of the string functions (-help string functions-) and see how you go.