Extract State letters from string variable

Hi I have a variable that contains numbers and letters. Ideally, participants had to enter their first two state letter - ex. NY- following by certain numbers and letters.

2 OH34789
3 FL3
4 45OHnmo56
5 ny1234

I would like to extract the first to letter of this string variable which is supposed to be the state. How can I do it in stata. There is another way to deal with this talking into account that some participants did not follow the correct format of entering their IDs such as participant 4 - which does not have their State letter at the beginning or 3 that have two blank spaces at the beginning?

Thank you!


1. Strip out the numbers:
foreach n of numlist 0/9 {
    replace state=subinstr(state, "`n'", "", .)
2. Trim the blank spaces:
replace state=trim(state)
3. Extract the first two letters and convert to upper case:
replace state=upper(substr(state, 1, 2))
Last edited:


After than run:
tab state, mis

To make sure that the states are all valid, and to look at the missing ones to see why they're missing. Of course it might be best to work with a copy of the original variable, rather than doing all of these modifications to the original variable - then you can look at the original variable to see why the final one is missing (or wrong).


Just short of 4 years... but I have only scratched the surface... it's a vast program!

In terms of learning Stata, the best thing I ever did was read the User's Guide from cover to cover. It covers most of what you need and is well written with clear examples. If you want to use Stata more than casually this is well worth the time investment.