time series objects

noetsi

Fortran must die
#1
I am trying to create a time series object. The data is in a folder "S:\\CIU\\A2_Routine Reports Library\\R\\DataforTS.xlsx"

I run
fts<-ts("S:\\CIU\\A2_Routine Reports Library\\R\\DataforTS.xlsx"[,2],start=c(2014,12),frequency=12)

and get
Error in "S:\\CIU\\A2_Routine Reports Library\\R\\DataforTS.xlsx"[, 2] :
incorrect number of dimensions

the table has two fields. The first is a date, the second spending. The link I am using that describes the ts function says

> usnim_ts = ts(usnim_2002[, 2], start = c(2002, 1), frequency = 4)

[which I convert to my data that is monthly]

The function ts() takes in three arguments:

  • data is set to everything in usnim_2002 except for the date column; it isn't needed since the ts object will store time information separately.
  • start is set to the form c(year, period) to indicate the time of the first observation. Here, January corresponds with period 1; likewise, a start date in April would refer to 2, July to 3, and October to 4. Thus, period corresponds to the quarter of the year.
  • frequency is set to 4 because the data are quarterly.
https://campus.datacamp.com/courses/forecasting-in-r/exploring-and-visualizing-time-series-in-r?ex=2
 

noetsi

Fortran must die
#2
while I am at it I run

mydata <- read.xlsx("S:\\CIU\\A2_RoutineReports Library\\R\\DataforTS.xlsx", sheetname=”Non-PREETS”)
and get
Error: unexpected input in "mydata<-read.xlsx("S:\\CIU\\A2_Routine Reports Library\\R\\DataforTS.xlsx", sheetname=”"

I spelled the location right...
 

noetsi

Fortran must die
#3
Strange this works perfectly
mydata<-read.csv(file="DataforTS.csv")
head(mydata)
Month Spend
1 1-Dec-14 $5,790,000.00
2 1-Jan-15 $5,114,841.08
3 1-Feb-15 $7,240,820.54
4 1-Mar-15 $7,482,837.26
5 1-Apr-15 $6,640,341.00
6 1-May-15 $5,476,163.28

But when I convert it to time series data it is completely wrong .

mydatats<- ts(mydata,frequency=12)
head(mydatats)
[1,] 12 12
[2,] 24 7
[3,] 18 26
[4,] 40 28
[5,] 1 20
[6,] 46 8

I have no idea what is going wrong
 
Last edited:

trinker

ggplot2orBust
#4
usnim_2002[, 2] Above you grabed 1 column, a vector. Do the same for this new data set with Spend
start = c(2002, 1) Above you specified a date start (year, month) but don't this time.

Also I think your data is character. Maybe use lapply(mydata, class) to see.
 

noetsi

Fortran must die
#5
I am not sure how I get the second coumn trinker, but I will try to find out. The documentation I looked at did not specify either.

My data is numeric in SAS and excel. I am not sure how R is seeing it.
 

trinker

ggplot2orBust
#6
It does but it's subtly using indexing as in [, 2]

So this will likely work:

ts(mydata[['Spend']],frequency=12)

But you need to specify start = c(YOUR_START_YEAR, YOUR_START_MONTH)

The dollar signs in the head you show above indicate character. R stores numeric/floats without dollar signs.
 

noetsi

Fortran must die
#7
> mydata<-read.csv(file="DataforTS.csv")
> Mytsdata<- ts(mydata[['Spend']],frequency=12,C(2014,12))
Error in C(2014, 12) : object not interpretable as a factor

But when I do Mytsdata<-ts(mydata[['Spend']],frequency=12) it works.

Now I just need to know why and how to use the above. I am not sure what the function that works is actually doing.
 

trinker

ggplot2orBust
#8
mydata[['Spend']] This is grabbing the Spend column using column indexing. Almost identical to mydata[, 2] but safer in that it's gauranteed to return the vector not a list.

Error in C(2014, 12) : object not interpretable as a factor R is case sensitive C is not c
 

noetsi

Fortran must die
#9
Not used to languages being case sensitive like that. Something to keep in mind as I go forward.

SAS commands don't pay attention to case for the most part. count and COUNT generate the same results.

Mytsdata<- ts(mydata[['Spend']],frequency=12,c(2014,12)) works

Mytsdata<-ts(mydata,frequency=12,c(2014,12)) does not. So for my data I have to use the above formulation.

Why are there two sets of [] around spend? Is this a R formatting function?
 

noetsi

Fortran must die
#10
I do this command of my time series object str(Mytsdata)

Time-Series [1:65, 1:2] from 2015 to 2020: 12 24 18 40 1 46 35 30 7 61 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "Month" "Spend"

I assume the $ mean character field. Which I don't get since neither are in excel or SAS. I assume the issue is with the CSV. I also don't understand why it shows 2015 as the data, as the R shows elsewhere, starts in 2014. And what the NULL column means.
 

noetsi

Fortran must die
#11
Anyone know why this does not rename the two variables?

names(Mytsdata)<-c("m","sp")
> head(Mytsdata)

the last command shows nothing changed.
 

trinker

ggplot2orBust
#12
> Mytsdata<-ts(mydata,frequency=12,c(2014,12)) does not. So for my data I have to use the above formulation.

This is because you passed the whole dataframe into the data argument, and usually you likely only want to pass a vector from the data frame.

> Why are there two sets of [] around spend? Is this a R formatting function?

The [] are not unique to R but exactly how they work I haven't seen elsewhere. Python uses brackets with indexing as well for example. In R, when you use [] on a data.frame you get a data.frame back. (same is true of lists too, and really a data frame is jsut a special version of a list, save that away for someday, but not today). When you use [[]] on a data.frame, you get an atomic vector back, in this case below it is of class character.

Code:
mydata <- structure(list(x = c("1-Dec-14", "1-Jan-15", "1-Feb-15", "1-Mar-15",
"1-Apr-15", "1-May-15"), y = c(" $5,790,000.00", " $5,114,841.08",
" $7,240,820.54", " $7,482,837.26", " $6,640,341.00", " $5,476,163.28"
)), row.names = c(NA, -6L), class = "data.frame")

mydata
Gives:
Code:
         x              y
1 1-Dec-14  $5,790,000.00
2 1-Jan-15  $5,114,841.08
3 1-Feb-15  $7,240,820.54
4 1-Mar-15  $7,482,837.26
5 1-Apr-15  $6,640,341.00
6 1-May-15  $5,476,163.28
Check the class:
Code:
class(mydata)
## [1] "data.frame"
Now look at []:

Code:
mydata[2]
Which gives:
Code:
               y
1  $5,790,000.00
2  $5,114,841.08
3  $7,240,820.54
4  $7,482,837.26
5  $6,640,341.00
6  $5,476,163.28
It's still a data.frame, not an atomic vector as seen below:
Code:
class(mydata[2])
## [1] "data.frame"
Next look at [[]]:
Code:
mydata[[2]]
Now it has a look of an atomic vector, no column structure.
Code:
[1] " $5,790,000.00" " $5,114,841.08" " $7,240,820.54" " $7,482,837.26"
[5] " $6,640,341.00" " $5,476,163.28"
The class check verifies:
Code:
class(mydata[[2]])
[1] "character"
 

trinker

ggplot2orBust
#13
See what this gives you:

Code:
names(Mytsdata)
That's because a ts object doesn't use names.

This is usefule beyond str() to see what the object looks like.

Code:
dput(Mytsdata)


What were you trying to do here:
Code:
names(Mytsdata)<-c("m","sp")
> head(Mytsdata)

the last command shows nothing changed.
 

noetsi

Fortran must die
#14
head(mydata)

Month Spend
1 1-Dec-14 $5,790,000.00
2 1-Jan-15 $5,114,841.08
3 1-Feb-15 $7,240,820.54
4 1-Mar-15 $7,482,837.26
5 1-Apr-15 $6,640,341.00
6 1-May-15 $5,476,163.28

these are both strings. I wanted to change Month to a date. I do

mydata$Month <-as.Date(mydata$Month,format= "%d/%m/%y")
and it works
str(mydata$Month)
Date[1:65], format:....

Ok I got this half solved. But now I need to change Spend which is a character into a number.

The data looks like this (Spend is the 2nd variable).
1 2020-01-12 5,790,000.00
2 2020-01-01 5,114,841.08

I need to get rid of the , and the periods and columns. All of them

I tried a lot of different ways, but this works.

mydata$Spend<-as.numeric(gsub("[['punct:]]", "",mydata$Spend)) without the ' inside the bracket since that creates a smiley here

In about a year I will be actually able to run data....
 
Last edited:

noetsi

Fortran must die
#15
I was not sure if this is wrong or simply how time series formats. But there are 16405 days between 12/1/2014 and 1/1/1970 when R calculates dates. Need to learn how to get the TS function to show months and dates in standard English not numbers :p

My original data looks like this
1 12/1/2014 5,790,000.00
2 1/1/2015 5,114,841.08
3 2/1/2015 7,240,820.54
4 3/1/2015 7,482,837.26
5 4/1/2015 6,640,341.00
6 5/1/2015 5,476,163.28

I did this to make it a date


mydata$Month <-as.Date(mydata$Month,format= "%m/%d/%Y")
Not sure why it makes it year first then month. Or if there is actually anything wrong with this.

head(mydata)
Month Spend
1 2014-12-01 579000000
2 2015-01-01 511484108
3 2015-02-01 724082054
4 2015-03-01 748283726
5 2015-04-01 664034100
6 2015-05-01 547616328

> Mytsdata<- ts(mydata,frequency=12,c(2014,12))

I am not sure if that is correct or Mytsdata<-(mydata$Spend ,frequency=12,c(2014,12))

> head(Mytsdata)
Month Spend
[1,] 16405 579000000
[2,] 16436 511484108
[3,] 16467 724082054
[4,] 16495 748283726
[5,] 16526 664034100
[6,] 16556 547616328
 
Last edited: