Adding new column in a dataframe in R

#1
I am trying to add new column to a dtaframe , this is my dataframe:

EquipmentID ZwaveType ProprietesName Value Date
63 0 2 1 2014-03-01 06:45:00
1182 0 2 1 2014-03-01 07:00:00
112678 0 2 1 2014-03-01 07:15:00
1167 0 2 1 2014-03-01 07:14:30
112678 0 2 0 2014-03-01 07:25:00
63 0 2 0 2014-03-01 07:45:00

so I think about browsing my dataframe and I'll generate a new column, for each date i will associate a value, and for the similar dates I will associate the same value, but I don't know until now if it's possible to do this business with R ??
 

JesperHP

TS Contributor
#2
this website gives you an easy introduction on Dates and Times in R
http://www.stat.berkeley.edu/classes/s133/dates.html

among other things it says:

Internally, Date objects are stored as the number of days since January 1, 1970, using negative numbers for earlier dates. The as.numeric function can be used to convert a Date object to its internal form.
so convert to Date and then to numeric variable:

Code:
#Example
#Some dates as character
mydates=c("04-02-1976","04-02-1976","05-02-1976","06-02-1976")
##converts to Date
as.Date(mydates,format="%m-%d-%Y")
##converts to Date and then numeric vector
as.numeric(as.Date(mydates,format="%m-%d-%Y"))
store the numeric vector and bind it to data.frame if needed...let df be the dataframe an df$date column of dates in df

Code:
df=data.frame(df,numerictime=as.numeric(as.Date(df$date)))
 
Last edited:
#3
Thanks for help but at first i have to browse my dataframe because I don't have a fixed vector with date value like this:
#Some dates as character
mydates=c("04-02-1976","04-02-1976","05-02-1976","06-02-1976")
an about the function as.numeric it is for converting Date in format "%m-%d-%Y" but I have th format DateTime like this "2014-03-01 06:45:00"
so Ihave to bring these value from my dataframe then store them in a vector, can I do this in R??
 
Last edited:

JesperHP

TS Contributor
#4
first i have to browse my dataframe
What exactly is meant by browsing the data.frame is unclear ... somekind of search and get operation.... but for the search to be efficient and valid certain assumptions on structure of data would have to be made others avoided depending on the exact structure or possible structure of your data.

Okay... so in order for anyone to be able to help you you would then need to create some artificial data with the same structure as the actual data but preferably simpler in nature (possible just a subset of your actual data). The example data should be reproducible in the form delivered here on talkstats meaning I should be able to copy paste the code creating the artificial data in my R console and then ending up with an example of your data....This is your way of telling me THIS IS WHAT IVE GOT.....assuming the stuff you copy pasted is your data structure something like this is what you should have done:

Code:
EquipmentID=c(63,1182,112678,1167,112678,63)
ZwaveType=rep(0,6)
ProprietesName=rep(2,6)
Value=c(1,1,1,1,0,0)
Date=c("2014-03-01 06:45:00","2014-03-01 07:00:00","2014-03-01 07:15:00","2014-03-01 07:14:30","2014-03-01 07:25:00","2014-03-01 07:45:00")
df=data.frame(EquipmentID, ZwaveType, ProprietesName, Value, Date)
okay...here there is a column named Date so assuming your data has this form it is wrong that
I don't have a fixed vector with date value
you have the vector
Code:
df$Date
if this is not the case you have to provide more illustrative example of your data.

True df$Date does not have the same format as in my example but that is exactly why there is an argument called format so that you can use it to tell R what format you data has......it is explained on the webpage referred to in previous post in this thread. Using what is explained there:

Code:
mylist=strsplit(df$Date," ")
Creates a list where date information is separated (split) from time information.

Now you can use R's functions for manipulating lists.... see ?lapply and ?sapply ... sapply applies function to components of list and simplifies the return...

Code:
mycharactervector=sapply(mylist,FUN=function(x) x[1])
and converting to value:

Code:
as.numeric(as.Date(mycharactervector,format="%Y-%m-%d"))

Perhaps you want not only to
for the similar dates I will associate the same value
but to associate same value for same date:time .... one way would be to think of the charactervector as defining a factor (different strings will be different levels of the factor variable) and then forcing the factor to numeric:

Code:
as.numeric(as.factor(mycharactervector))
but this probably want work because you have to preserve the timeorder.....

so still best guidance if you want to use both date and time is to read the webpage I linked to it says line 1:
the contributed library chron handles dates and times
so install package chron and load and follow the examples on the webpage ....
The chron function converts dates and times to chron objects.
and having a chronbject
Chron values are stored internally as the fractional number of days from January 1, 1970. The as.numeric function can be used to access the internal values.
hence as.numeric can convert to numeric vector respecting the time order...

So copy pasting from the webpage I linked you can do:

Code:
install.packages("chron")
library(chron)
time.df=t(as.data.frame(strsplit(Date," ")))
rownames(time.df)=NULL
datetime = chron(dates=time.df[,1],times=time.df[,2],
                  format=c('y-m-d','h:m:s'))
as.numeric(datetime)
 
Last edited:
#5
Thanks for your help .
I actually connected R and sql server to retrieve the database table of data:

code:
#load RODBC library
library("RODBC")
#Connection with SQL Server
connection<-odbcConnect("ChifcoProd-2013-12-12-11-37")
#Retrieve data from database,load data in data.frame
dataset<-sqlQuery(connection,"Select dbo.Equipments.EquipmentID,Name,ZwaveType,ProprietesName ,Value,Date from dbo.Consumptions,dbo.PropertiesInstances,dbo.EquipmentProperties,dbo.Equipments where dbo.Consumptions.PropertiesInstanceID = dbo.PropertiesInstances.PropertiesInstanceID and dbo.PropertiesInstances.EquipmentPropertiesID = dbo.EquipmentProperties.EquipmentPropertiesID and dbo.EquipmentProperties.EquipmentID = dbo.Equipments.EquipmentID and dbo.Consumptions.Unit=8 ")

So I want to apply the som algorithm for learning and data classification that is why I need a numeric datafrrame before applying the algorithm.
I have used this code to adding the numeric column to the dataframe :
code:
mydates=c(" 2014-03-01 06:45:00","2014-03-01 07:00:00","2014-03-01 07:15:00","2014-03-01 07:14:30","2014-03-02 06:45:00","2014-03-02 07:00:00")
##converts to Date
as.POSIXct(mydates,format="%Y-%m-%d %hh:%mm:%ss")
##converts to Date and then numeric vector
as.numeric(as.POSIXct(mydates,format="%Y-%m-%d %hh:%mm:%ss"))
dataset=data.frame(dataset,numerictime=as.numeric(as.POSIXct(dataset$Date)))

and then I have selected Column that Ineed to applying the standardization:

code:
#normalize data before feeding som algorithm
normalize(dataset, byrow=TRUE)
 
Last edited: