# Extract State letters from string variable

#### Marvin85

##### Member
Hi I have a variable that contains numbers and letters. Ideally, participants had to enter their first two state letter - ex. NY- following by certain numbers and letters.

1 NY12OPR
2 OH34789
3 FL3
4 45OHnmo56
5 ny1234

I would like to extract the first to letter of this string variable which is supposed to be the state. How can I do it in stata. There is another way to deal with this talking into account that some participants did not follow the correct format of entering their IDs such as participant 4 - which does not have their State letter at the beginning or 3 that have two blank spaces at the beginning?

Thank you!

#### bukharin

##### RoboStataRaptor
1. Strip out the numbers:
Code:
foreach n of numlist 0/9 {
replace state=subinstr(state, "n'", "", .)
}
2. Trim the blank spaces:
Code:
replace state=trim(state)
3. Extract the first two letters and convert to upper case:
Code:
replace state=upper(substr(state, 1, 2))`

Last edited:

#### bukharin

##### RoboStataRaptor
After than run:
tab state, mis

To make sure that the states are all valid, and to look at the missing ones to see why they're missing. Of course it might be best to work with a copy of the original variable, rather than doing all of these modifications to the original variable - then you can look at the original variable to see why the final one is missing (or wrong).

#### Marvin85

##### Member
Wow bro! You are the man! How long have you being using STATA? I am so far behind.

#### bukharin

##### RoboStataRaptor
Just short of 4 years... but I have only scratched the surface... it's a vast program!

In terms of learning Stata, the best thing I ever did was read the User's Guide from cover to cover. It covers most of what you need and is well written with clear examples. If you want to use Stata more than casually this is well worth the time investment.