Stochastic differential equation data simulation

#1
I have written the following which takes a bit of time to run (which is to be expected) but I am just wondering if I could speed up the time it takes if I make my code more efficient. Any help with this would be much appreciated!

Code:
require(deSolve)

S <- c(y=.1, b=.2, o=.7)

times <- seq(0, 0.01, by=0.01)

rpsCont <- function(t, S, P){
           with(as.list(c(S)),{
	
                 w_y <- pay[1,1]*y + pay[1,2]*b + pay[1,3]*o
                 w_b <- pay[2,1]*y + pay[2,2]*b + pay[2,3]*o 
                 w_o <- pay[3,1]*y + pay[3,2]*b + pay[3,3]*o
                 w_bar <- y*w_y + b*w_b + o*w_o

                 dy <- y * (w_y-w_bar)
                 db <- b * (w_b-w_bar)
                 dO <- o * (w_o-w_bar)

           list(c(dy, db, dO))

           })

}

plotRPSerror <- function(){
	out <<- as.data.frame(ode(y=S, times=times, func=rpsCont, parms=P))
	
	outOld <- as.data.frame(ode(y=S, times=times, func=rpsCont, parms=P))
	
	for(i in 3:10001){
		
		er <- rdirichlet(1, c(1,1,1))
		
		K <- as.numeric((outOld[2,2:4] + er) / 2)
		
		S <- c(y=K[1], b=K[2], o=K[3])
		
		rpsCont <- function(t, S, P){
			with(as.list(c(S)),{
			
			w_y <- pay[1,1]*y + pay[1,2]*b + pay[1,3]*o
			w_b <- pay[2,1]*y + pay[2,2]*b + pay[2,3]*o
			w_o <- pay[3,1]*y + pay[3,2]*b + pay[3,3]*o
			w_bar <- y*w_y + b*w_b + o*w_o
			
			dy <- y * (w_y - w_bar)
			db <- b * (w_b - w_bar)
			dO <- o * (w_o - w_bar)
			
			list(c(dy, db, dO))
			
			}) #end with as.list
			
			} #end rpsCont
						
			outTemp <- as.data.frame(ode(y=S, times=times, func=rpsCont, parms=P))
						
			out <<- rbind(out, outTemp[2,])
			
			outOld <- outTemp
			
			} #end for loop
		
		out1 <<- cbind(time=seq(0,100,by=0.01), out)
		
		
	
matplot(out1[,"time"], out1[,2:4], type = "l", xlab = "time", ylab =
"frequency", main = "RPS", lwd = 2)
legend("topleft", c("o", "b"), col = 1:2, lty = 1:2)
} #end whole function
I apologize in advance, I know that I can tend to write pretty inefficient code but I am relatively new to R.
 

Link

Ninja say what!?!
#2
Ahhh...efficient code writing. This is definitely one of my interests.

First off, I'm a little confused as to why you wrote "times <- seq(0, 0.01, by=0.01)". This will give you just two values: 0 and 0.01. Did you simple mean to simplify it to post?

Second, if you plan on having other people read your code, it's good practice to start commenting generously. I see a lot of equations and variables but don't know the purpose of them.

Next, you should try as much as possible to avoid writing for loops. That's likely what's slowing you down. R is good at using vectors and though it takes practice, you can turn ALMOST any for loop into a vector for R. Look into commands such as apply, sapply, lapply, and tapply. Once you get good at lapply, you can even upgrade to mclapply which runs multiple instances of R for multicore processors.

Also, why do you define rpsCont twice?
 
#3
First off, I'm a little confused as to why you wrote "times <- seq(0, 0.01, by=0.01)". This will give you just two values: 0 and 0.01. Did you simple mean to simplify it to post?
This is because I need to add the error term in at every step of the sequence (in this case t goes from 0 to 100 by 0.01 increments), so I set it to output two time steps, the state (the previous period's strategy with error term added) and the next period, then I used rbind to add that second time step to the matrix of every period.

Also, why do you define rpsCont twice?
I do this so that I can start with the first two time steps (t=0 and t=0.01) before getting to the loop where I am adding the error terms.

I will take a look at the apply functions to see if I can figure out how to make it more efficient that way.

I will also use more comments in my code next time, thanks for reminding me.
 

Dason

Ambassador to the humans
#4
This is because I need to add the error term in at every step of the sequence (in this case t goes from 0 to 100 by 0.01 increments), so I set it to output two time steps, the state (the previous period's strategy with error term added) and the next period, then I used rbind to add that second time step to the matrix of every period.
Sure. But I think what they're asking is why you used seq instead of just using c(0, .01).

Also... if you define the function outside the loop you don't need to define it inside the loop again. Unless there is something that I'm missing where you're using certain properties about scoping when defining functions.
 
#5
Thanks for the suggestions, I see now (with a little help from Dason) that you are right about both times and rpsCont. This is what I have now:

Code:
function(){
	out <<- as.data.frame(ode(y=S, times=times, func=rpsCont, parms=P))
	
	outOld <- as.data.frame(ode(y=S, times=times, func=rpsCont, parms=P))
	
	for(i in 3:10001){
		
		er <- rdirichlet(1, c(1,1,1))
		
		K <- as.numeric(outOld[2,2:4] * 0.999 + er * 0.001)
		
		S <- c(y=K[1], b=K[2], o=K[3])
						
		outTemp <- as.data.frame(ode(y=S, times=times, func=rpsCont, parms=P))
						
		out <<- rbind(out, outTemp[2,])
			
		outOld <- outTemp
			
			} #end for loop
		
		out1 <<- cbind(time=seq(0,100,by=0.01), out)
		
		
	
matplot(out1[,"time"], out1[,2:4], type = "l", xlab = "time", ylab =
"frequency", main = "RPS", lwd = 2)
legend("topleft", c("y", "b"), col = 1:2, lty = 1:2)
} #end whole function
I'm not comfortable enough with the apply functions to work with them yet... That being said, is this fairly efficient? I know the loop will slow it down a lot but I am getting the following system time (which seems pretty slow to me):

Code:
system.time(plotRPSerror())
   user  system elapsed 
108.928  22.166 133.890
 

Dason

Ambassador to the humans
#6
One thing that is good to avoid is rbind and cbind. They're alright if you just want to quickly add something. But in this case you know what the dimension of the final output will be so the better thing would be to initialize 'out' and 'out1' to the sizes they're going to be at the end and then just manipulate the values during the loop.

Also can I ask why you're using global assign statements (things like "<<-")? Typically it's good practice to avoid them. Sometimes they're a necessary evil but it's just good coding practice to avoid them if you can. A better option might be to just use local assignment inside the function and return the values at the end of the function possibly in a list.
 
#7
One thing that is good to avoid is rbind and cbind
I have since made the size of the loop variable, and I'm not sure how to create that matrix.

Thanks for the suggestion with respect to the global assign statement. Out of curiosity, why is it a better to not use them? Do they slow things down?
 

Dason

Ambassador to the humans
#8
Even if the loop size is variable you should still be able to figure out how large the resulting matrix should be. For example if I knew I wanted a matrix with 4 columns but I wanted to have a variable number of rows I could do something like this:
Code:
nrows <- 1000
output <- matrix(0, nrows, 4)
for(i in 1:nrows){
  output[i, ] <- rnorm(4, mean = 0, sd = i)
}
 

Dason

Ambassador to the humans
#10
Indeed. It has to do with the way R is allocating and filling memory. The way you were doing it before you had to do A LOT of copying. Essentially every time through the loop you'd end up copying everything you had already added to the output. If you preallocate you don't do that.