R plot

hlsmith

Omega Contributor
#1
There is a function "plotcp" which you feed your rpart output into and it generates the following plot:

1518793725829.png

I would like to replicate this using the following dataset. Of note, the whiskers are just the xerror +/- xstd. The xstd are actually SE values since they are taken from cross-validation. I am goinng to try and find the source code for "plotcp" to see if I can adapt that. Below is my data and additional code that may help. I get this is a simple plot, but with my lack of regular R use, I thought you all code speed up the process. P.S., the "plotcp" may do a little extra, but I just want the follow graph similarly. Thanks!

Code:
d <- read.table(header = TRUE, text = "
CP nsplit rel_error  xerror     xstd
0.13210900      0  1.000000 1.00000 0.029985
0.05924171      2  0.735782 0.76896 0.027242
0.03199052      3  0.676540 0.67654 0.025900
0.00394945      4  0.644550 0.64455 0.025397
0.00296209      9  0.616114 0.65284 0.025529
0.00236967     11  0.610190 0.66943 0.025790
0.00203114     12  0.607820 0.68009 0.025955
0.00197472     22  0.585308 0.68365 0.026009
0.00177725     25  0.579384 0.68246 0.025991
0.00165877     79  0.422986 0.67417 0.025864
0.00157978     87  0.408768 0.67773 0.025918
0.00118483     93  0.399289 0.68365 0.026009
0.00098736    148  0.330569 0.73697 0.026796
0.00094787    174  0.299763 0.76540 0.027194
0.00088863    207  0.267773 0.77014 0.027259
0.00084631    236  0.241706 0.78910 0.027514
0.00078989    243  0.235782 0.78910 0.027514
0.00059242    297  0.178910 0.80095 0.027671
0.00047393    398  0.112559 0.85427 0.028349
0.00039494    410  0.106635 0.86256 0.028450
0.00035545    444  0.090047 0.86137 0.028436
0.00010000    462  0.082938 0.86730 0.028508
")

#Typical code to get the plot via rpart output
print(crs$rpart)
printcp(crs$rpart)
plotcp(crs$rpart, col=2, upper = c("size", "splits", "none"))  # visualize cross-validation results
grid()
 
Last edited:

hlsmith

Omega Contributor
#2
Update, "plotcp" function code that needs adapted or referenced (sorry formatting didn't copy, let me know how to do that in the future):

https://www.rdocumentation.org/packages/rpart/versions/4.1-12/source

Code:
## Contributed by B.D. Ripley 97/07/17
##
plotcp <- function(x, minline = TRUE, lty = 3, col = 1,
upper = c("size", "splits", "none"), ...)
{
dots <- list(...)
if (!inherits(x, "rpart")) stop("Not a legitimate \"rpart\" object")
upper <- match.arg(upper)
p.rpart <- x$cptable
if (ncol(p.rpart) < 5L)
stop("'cptable' does not contain cross-validation results")
xstd <- p.rpart[, 5L]
xerror <- p.rpart[, 4L]
nsplit <- p.rpart[, 2L]
ns <- seq_along(nsplit)
cp0 <- p.rpart[, 1L]
cp <- sqrt(cp0 * c(Inf, cp0[-length(cp0)]))
if (! "ylim" %in% names(dots))
dots$ylim <- c(min(xerror - xstd) - 0.1, max(xerror + xstd) + 0.1)
do.call(plot, c(list(ns, xerror, axes = FALSE, xlab = "cp",
ylab = "X-val Relative Error", type = "o"), dots))
box()
axis(2, ...)
segments(ns, xerror - xstd, ns, xerror + xstd)
axis(1L, at = ns, labels = as.character(signif(cp, 2L)), ...)
switch(upper,
size = {
axis(3L, at = ns, labels = as.character(nsplit + 1), ...)
mtext("size of tree", side = 3, line = 3)
},
splits = {
axis(3L, at = ns, labels = as.character(nsplit), ...)
mtext("number of splits", side = 3, line = 3)
})
minpos <- min(seq_along(xerror)[xerror == min(xerror)])
if (minline) abline(h = (xerror + xstd)[minpos], lty = lty, col = col)
invisible()
}
P.S., this doesn't have to be in base plot.
 
Last edited:

Dason

Ambassador to the humans
#3
It looks like the code only really checks a few things: 1) that the object passed in is of class "rpart" and that the rpart$cptable output meets a few criteria. We'll just make sure to format your data in that way and then we can use the plotting function directly

Code:
fit <- list()
fit$cptable <- d
class(fit) <- "rpart"
plotcp(fit)
 

hlsmith

Omega Contributor
#4
Dason, your solution was better than mine. I had deactivated the "stops" in the function to get it to run. However, my quick and dirty fix would mean I changed the function. Your upstream solution circumnavigates that issue and is simpler.

Thanks.
 

Dason

Ambassador to the humans
#5
The only caveat is you have to be ok with lying. Because we totally don't have a truly valid "rpart" object. Only a part of one. So if you're not worried about rpart finding out that you lied to it then I think it should be ok.
 

hlsmith

Omega Contributor
#6
@Dason Follow-up question, how can I add an additional abline(v=0.0039)?
I tried adding it after plotting plotcp and by adding it within the function to not avail.
 

hlsmith

Omega Contributor
#7
It is a valid rpart table. I was just trying to mess around with the plot. Used data were generated by rpart. So I ain't gonna be afraid of no r package.