# Extract numbers from a string

#### kaisermuhle

##### New Member
I have this data of duration in the following formatT2H4M37S
and I want to extract the hours, minutes and seconds separately. I tried to use split command but it didn't work because the format is not uniform. They can be:
PT2H
PT2H4M
PT2H4M37S
PT4M
PT4M37S
PT37S
(not to mention if number of seconds is less than 10, it is displayed PT9).
In Excel, I can do an IFERROR(SUBSTRACT(LEFT(RIGHT(FIND...)))) to find the correct separation but I have trouble in Stata. Any help / suggestion? Thanks

#### Robert Picard

##### New Member
You can do this using regex functions

Code:
clear
input str9 sandbox
PT2H
PT2H4M
PT2H4M37S
PT4M
PT4M37S
PT37S
end

gen hours = regexs(1) if regexm(sandbox,"([0-9]+)H")
gen mins = regexs(1) if regexm(sandbox,"([0-9]+)M")
gen secs = regexs(1) if regexm(sandbox,"([0-9]+)S")

list

#### kaisermuhle

##### New Member
Thank you Robert Picard it works very well. But do I have to put each of the element in the sandbox variable? Suppose I have 10,000 observations, each of them is different from one another due to duration, is there anyway I can import a variable in the input? Thanks

#### Robert Picard

##### New Member
Sorry but I don't understand what you are saying. If your substrings are part of a larger string variable, then the commands I posted should still work, just use the appropriate variable name instead of the one I used (sandbox). If you are talking about something else, can you post a small representative data example, preferably generated using dataex (from SSC). You can install it using

Code:
ssc install dataex

#### kaisermuhle

##### New Member
Hi Robert thanks for your reply. My data looks like this. How do you think I should put all of that after the input command?
duration
"PT2M8S"
"PT3M13S"
"PT1M24S"
"PT37S"
"PT20S"
"PT55S"
"PT1M2S"
"PT40S"
"PT50S"
"PT3M17S"
"PT55S"
"PT32S"
"PT18S"
"PT37S"
"PT1M40S"
"PT39S"
"PT3M16S"
"PT1M34S"
"PT1M18S"
"PT1M26S"
"PT1M40S"
"PT1M33S"
"PT19S"
"PT21S"
"PT25S"
"PT37S"
"PT29S"
"PT38S"
"PT2M31S"
"PT1M8S"
"PT2M19S"
"PT44S"
"PT50S"
"PT40S"
"PT52S"
"PT51S"
"PT1H2M49S"
"PT2H49M3S"
"PT3M14S"
"PT50S"
"PT2M39S"
"PT58S"
"PT25S"
"PT34S"
"PT11S"
"PT12S"
"PT9S"
"PT2M25S"
"PT1M51S"
"PT13S"
"PT13S"
"PT16S"
"PT16S"
"PT22S"
"PT16S"
"PT16S"
"PT16S"
"PT16S"
"PT16S"
"PT15S"
"PT16S"
"PT15S"
"PT16S"
"PT16S"
"PT15S"
"PT3H58M12S"
"PT16S"
"PT38S"
"PT1M33S"
"PT46S"
"PT53S"
"PT51S"
"PT48S"
"PT2M30S"
"PT1M53S"
"PT1M13S"
"PT1M40S"
"PT2M10S"
"PT1M30S"
"PT1M25S"
"PT1M57S"
"PT1M22S"
"PT1M24S"
"PT2M2S"
"PT1M35S"
"PT53S"
"PT1M39S"
"PT3M18S"
"PT2M16S"
"PT10S"
"PT1M2S"
"PT1M36S"
"PT44S"
"PT1M18S"
"PT1M16S"
"PT1H1M2S"
"PT1M"
"PT1M5S"
"PT2M31S"
"PT1M54S"
"PT1M24S"
"PT1M33S"
"PT1M59S"
"PT1M33S"
"PT1M25S"
"PT56S"
"PT49S"
"PT57S"
"PT2M43S"
"PT10S"
"PT29S"
"PT2M31S"
"PT1M56S"
"PT30S"
"PT1M2S"
"PT1H43S"
"PT43S"
"PT28S"
"PT46S"
"PT28S"
"PT45S"
"PT46S"
"PT1M36S"
"PT4M51S"
"PT3M31S"
"PT20S"
"PT18S"
"PT1M47S"
"PT21S"
"PT58S"
"PT1M24S"
"PT1M42S"
"PT1M8S"
"PT23S"
"PT25S"
"PT4M35S"
"PT38S"
"PT1M53S"
"PT2M25S"
"PT20S"
"PT1M37S"
"PT1M55S"
"PT19S"
"PT1M3S"
"PT1M6S"
"PT2M9S"
"PT1M51S"
"PT1M54S"
"PT16S"
"PT1M37S"
"PT1M48S"
"PT1M41S"
"PT1M39S"
"PT5M4S"
"PT1M"
"PT2M42S"
"PT1M39S"
"PT1M20S"
"PT2M42S"
"PT1M39S"
"PT1M22S"
"PT1M13S"
"PT1M49S"
"PT1M49S"
"PT14S"
"PT1M8S"
"PT1M42S"
"PT54S"
"PT1M17S"
"PT1M26S"
"PT1M23S"
"PT57S"
"PT1M24S"
"PT45S"
"PT48S"
"PT50S"
"PT29S"
"PT24S"
"PT34S"
"PT42S"
"PT44S"
"PT52S"
"PT22S"
"PT23S"
"PT1M15S"
"PT1M3S"
"PT17S"
"PT52S"
"PT20S"
"PT58S"
"PT1M2S"
"PT47S"
"PT1M31S"
"PT1M5S"
"PT58S"
"PT1M16S"
"PT1M3S"
"PT1M2S"
"PT58S"
"PT1M7S"
"PT58S"
"PT1M5S"
"PT1M"
"PT1M6S"
"PT1M38S"
"PT1M13S"
"PT1M41S"
"PT46S"
"PT2M7S"
"PT51S"
"PT1M37S"
"PT3M34S"
"PT59S"
"PT52S"
"PT54S"
"PT1M6S"
"PT3M13S"
"PT58S"
"PT11M9S"
"PT51S"
"PT49S"
"PT20S"
"PT54S"
"PT1M36S"
"PT51S"
"PT1M25S"
"PT59S"
"PT7S"
"PT20S"
"PT36S"
"PT39S"
"PT40S"
"PT46S"
"PT47S"
"PT31S"
"PT1M32S"
"PT1M34S"
"PT1M34S"
"PT1M44S"
"PT3M1S"
"PT41S"
"PT5M36S"
"PT1M31S"
"PT1M31S"
"PT1M31S"
"PT32S"
"PT48S"
"PT1M1S"
"PT4M1S"
"PT2M3S"
"PT1M24S"
"PT37S"
"PT2M3S"
"PT59S"
"PT2M4S"
"PT2M15S"
"PT1M27S"
"PT2M50S"
"PT3M32S"
"PT4M46S"
"PT1M13S"
"PT5M17S"

#### Robert Picard

##### New Member
I think you misunderstood the purpose of the section of code that creates a data example. It's there so that you can test and see the results using a fully functional example of a solution. In your case, if your variable is called duration, then all you need is to load your data in memory and type

Code:
gen hours = regexs(1) if regexm(duration,"([0-9]+)H")
gen mins = regexs(1) if regexm(duration,"([0-9]+)M")
gen secs = regexs(1) if regexm(duration,"([0-9]+)S")

#### kaisermuhle

##### New Member
thank you very much Robert. It's very helpful!