Question on strptime with peeling out hours

ledzep

Point Mass at Zero
#1
Dear R users,

I am trying to peel out information on hours from a column which contains information on date and hours.

An example

Code:
# data
x<-dput(structure(list(date = structure(c(4L, 3L, 1L, 5L, 2L), .Label = c("01/14/92 01:03:30", 
"02/01/92 16:56:26", "02/27/92 22:29:56", "02/27/92 23:03:20", 
"02/28/92 18:21:03"), class = "factor")), .Names = "date", row.names = c(NA, 
-5L), class = "data.frame"))

>x
               date
1 02/27/92 23:03:20
2 02/27/92 22:29:56
3 01/14/92 01:03:30
4 02/28/92 18:21:03
5 02/01/92 16:56:26
I can easily gather the year (yyyy-mm-dd) information using strptime, whereas I am stuck separating out hours.

Code:
x$Year<-strptime(x$date, "%m/%d/%y")   # peels out year info
> x
               date       Year
1 02/27/92 23:03:20 1992-02-27
2 02/27/92 22:29:56 1992-02-27
3 01/14/92 01:03:30 1992-01-14
4 02/28/92 18:21:03 1992-02-28
5 02/01/92 16:56:26 1992-02-01

# trying the same for digging out hours don't seem to work as NAs are returned
x$hours<-strptime(x$date, "%H:%M:%S")    # trying to peel out hours
>x
               date       Year hours
1 02/27/92 23:03:20 1992-02-27  <NA>
2 02/27/92 22:29:56 1992-02-27  <NA>
3 01/14/92 01:03:30 1992-01-14  <NA>
4 02/28/92 18:21:03 1992-02-28  <NA>
5 02/01/92 16:56:26 1992-02-01  <NA>
So, I haven't been able to figure out how to create a new column based on hours. Any suggestions on how to do it?

Many Thanks.
(& Happy New Year 2012)
 

Dason

Ambassador to the humans
#2
You don't actually have x$date stored as dates so it's a little bit harder to work with. But it's not too bad to pull out what you want just working with it as a character. Plus I don't work with dates that often anyway so oh well.

Code:
x$Hour <- sapply(strsplit(as.character(x$date), "[:space: \\:]"), function(x){x[2]})

# Hopefully it's not too confusing - I'll break it down.

# We want to split the string as characters based on spaces and :
# This gives us a list
j <- strsplit(as.character(x$date), "[:space: \\:]")

# Then from that output we want to grab the second element in each piece of the list
sapply(j, function(x){x[2]})

# Pulling it together without storing into a temporary variable gives the original expression.
If you wanted to grab minutes and seconds it would make sense to create that temporary variable and then just use sapply to grab the pieces you want.
 

bryangoodrich

Probably A Mammal
#3
First, your data is stored as factor. I would install and load the chron package when dealing with data-time. Put it this way:

Use class Date when dealing with dates only.
Use class chron when dealing with dates and time only.
Use class POSIX when dealing with dates, time, and time zones.

Now, when you convert your object to a chron object, you get access to some convenient wrappers such as hours and months. They automatically return what you want.

Note, you may need to use as.character when converting your factors. I ran into this problem when I learned how to deal with date data a few months ago! Real pain to miss that lol
 

trinker

ggplot2orBust
#4
Dason said:
You don't actually have x$date stored as dates so it's a little bit harder to work with. But it's not too bad to pull out what you want just working with it as a character.
Ditto for my approach!

Code:
x$Time <- do.call("rbind", strsplit(sub(" ", ";", x$date), ";"))[,2]
x$hour <- do.call("rbind", strsplit(sub(":", ";", x$Time), ";"))[,1]
x
I didn't know if you actually want just hours or want it in H:M:S format. My approach gives both.

PS that Dason beat me to the answer :)

EDIT: Purely for the learning...

The separating of the date and time could have been done in the following manor. This has proven useful to me on many occasions for splitting strings into columns.
Code:
x <- do.call("rbind", strsplit(sub(" ", ";", x$date), ";"))
colnames(x) <- c("date", "time")
x
 

bryangoodrich

Probably A Mammal
#5
Apparently we all jumped on this at the same time this morning! lol (I just woke up)

Note, the chron function can be a bit of a pain because you'll need to break up your data per the above manipulations to get a date and time object (or just substring the date object on-the-fly since the sizes are fixed). See the help files for it. Note, the nice thing about chron is you can use the date aggregates (months and years) on the other date types.

My fix:

Code:
x <- transform(x, date = as.character(date))  # For good measure!
x <- transform(x, date = substring(date, 1, 8), time = substring(date, 10, 17))
x <- transform(x, datetime = chron(date, time))
hours(x$datetime)
# 23 22 1 18 16

months(x$datetime)
# [1] Feb Feb Jan Feb Feb
# 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec

as.Date(x$datetime)
# [1] "1992-02-27" "1992-02-27" "1992-01-14" "1992-02-28" "1992-02-01"
You'll have to do additional processing to control your output or whatever for pretty display, but now your data is in a nice format to control computationally.
 

ledzep

Point Mass at Zero
#6
Thanks Dason for your help and for the comments (always helps :) ).

Actually, what I was looking for was slightly different (my fault as I didn't put down my desired output). However, I have made a slight your code to get what I want.

Code:
# slightly different as I want detailed hour information 
x$Hour <- sapply(strsplit(as.character(x$date), "[ \\]"), function(x){x[2]})

> x
               date       Date     Hour
1 02/27/92 23:03:20 1992-02-27 23:03:20
2 02/27/92 22:29:56 1992-02-27 22:29:56
3 01/14/92 01:03:30 1992-01-14 01:03:30
4 02/28/92 18:21:03 1992-02-28 18:21:03
5 02/01/92 16:56:26 1992-02-01 16:56:26
Many Thanks.
 

trinker

ggplot2orBust
#7
Bryangoodrich I think that chron isn't in the base package so you'll have to get the package from CRAN if you want to go bryangoodrich's route. (It depends on if you really want to work with these dates and times or just want to "peel" them off for simple use as to which approach you'd want to take.

I didn't know about chron (or forgot). I attempted this with as.POSIXct but couldn't get the results I wanted. Nice info about chron.
 

Dason

Ambassador to the humans
#8
I thought bryan would come in with some good advice. I was just offering a hack to get at what you want but dealing with the appropriate types would be safer in case something isn't formatted correctly it will most likely let you know that it can't do what you want and you'll be able to figure that out.

I tried briefly getting your date as POSIX but then thought it was too much of a hassle and made that hack.
 

trinker

ggplot2orBust
#9
From ledzep's comments I think he only wanted a hacky fix (but the learning was great bryangoodrich). I figured we needed a New Years race with each of the hacky methods to see who's horse wins.

Code:
benchmark(
 dason = sapply(strsplit(as.character(x$date), "[ \\]"), function(x){x[2]}),
 tyler = do.call("rbind", strsplit(sub(" ", ";", x$date), ";"))[,2],
 bryangoodrich = substring(x$date, 10, 17), 
replications = 1000)

           test replications elapsed relative user.self sys.self user.child sys.child
3 bryangoodrich         1000    0.05      1.0      0.05        0         NA        NA
1         dason         1000    0.25      5.0      0.16        0         NA        NA
2         tyler         1000    0.11      2.2      0.11        0         NA        NA
bryangoodrich's horse wins by a nose.

Gosh I love this benchmark function :)
 

bryangoodrich

Probably A Mammal
#10
I didn't know about chron (or forgot). I attempted this with as.POSIXct but couldn't get the results I wanted. Nice info about chron.
Did you convert the factors to character first? I had that problem when dealing with POSIX classes. It didn't seem to be a problem in this case for chron. While it may not be in base-r, it is definitely worth having if you're dealing with date data! Also, if you're not dealing with time zones, stay away from POSIX. They can be such a pain! I'd probably go with POSIXlt or whatever it is. It keeps the data object as a list with each component accessible like from a list, which can give you easy access to certain parts of the date object. POSIXct keeps the entity as a raw number + meaning that you can use methods to extract information. It just depends on what you want to do, of course.

Also, did you specify the full format with POSIX? The other thing I liked about chron is the intuitive input. Usually POSIX and Date classes are picky about the format, which usually isn't our standard mm/dd/yy format! I rarely get to just go as.Date(x)! Well, unless I've handled the data because I make sure now to put it into a good POSIX form ("yyyy-mm-dd")