# Plots to different charts

#### obruck

##### New Member
Hi!

My situation is very much the same as in this image http://www.nature.com/bmt/journal/v40/n4/fig_tab/1705727f1.html#figure-title

I am comparing patients from two diseases (ALL, AML) with two different disease status (1,2). I am trying to calculate with cumulative incidence how many ALL and AML patients achieve the value 1 from all ALL and AML patients, respectively. However, the chart displays both values so I get 4 plots instead of 2.

What should I do to have a plot with "ALL1" and "AML1" only?

Thank you in advance for your help!

#### lilchaos

##### New Member
Not sure if i understood you correctly but maybe the easiest and fastest would be to use a subset where Disease status is only 1?

something like:

#### obruck

##### New Member
The creator of that script is actually the author of the nature I posted. I happened to need the same script. And I happen also to have very basic coding skills so I appreciate all help I get!

I tried the code you suggested and added at the very end. R definitely understood it but it did nothing to the plot. Should there be some extra coding to do to draw the new plot?

Thanks again for your interest.

As a possible alternative I came to think if something should be altered in the original Cum.Incidence coding http://www.stat.unipg.it/luca/R/CumIncidence.R ?

#### trinker

##### ggplot2orBust
This question would be improved if you told use where CumIncidence came from and used dput to share code (as I do below) not attaching a csv/xlsx. I googled and got the following script for the function you use but don't include: http://www.stat.unipg.it/luca/R/CumIncidence.R and so used that in my answer below. If I understood the problem more and I were you I'd do this myself and make it tailored to my own needs maybe with ggplot2's geom_step function.

Here's my plot:

All I have done is tweaked the following lines of code in the function. FUll code and reproducible data example below that (changes in red):

Code:
matplot(time, base::t(x$est)[COLOR="red"][,1:2][/COLOR], type="s", ylim = c(0,1), xlab = xlab, ylab = ylab, xaxs="i", yaxs="i", col = col, lty = lty, lwd = lwd) legend("topleft", legend = rownames(x$est)[COLOR="red"][1:2][/COLOR], x.intersp = 2,
bty = "n", xjust = 1, col = col, lty = lty, lwd = lwd)
Full code:

Code:
#############################################################################
#                                                                           #
#                 CUMULATIVE INCIDENCE CURVES IN R                          #
#                                                                           #
# Written by Luca Scrucca                                                   #
#                                                                           #
# Reference:                                                                #
# Scrucca L., Santucci A., Aversa F. (2007) Competing risks analysis using  #
#   R: an easy guide for clinicians. Bone Marrow Transplantation, 40,       #
#   381--387.                                                               #
#############################################################################
# ver. 1.1 Feb 2008
#          - allow group to be missing
#          - if t is provided both computation and plots use t as time points
#          - allow col, lwd to be used for curves with confidence bands
#          - fix some bugs in the legend
#          - added help on source code
# ver. 1.0 May 2007
#          - Version appearing in the BMT paper
#############################################################################
#
# Usage:
#
#   CumIncidence(ftime, fstatus, group, t, strata, rho = 0, cencode = 0,
#             	 subset, na.action = na.omit, level,
#                xlab = "Time", ylab = "Probability",
#                col, lty, lwd, digits = 4)
#
# Arguments:
#
# ftime = failure time variable.
# fstatus = variable with distinct codes for different causes of
#           failure and also a distinct code for censored observations.
# group	= estimates will be calculated within groups given by distinct
#         values of this variable. Tests will compare these groups. If
#         missing then treated as all one group (no test statistics).
# t = a vector of time points where the cumulative incidence function
#     should be evaluated.
# strata = stratification variable. Has no effect on estimates. Tests
#          will be stratified on this variable. (all data in 1 stratum,
#          if missing).
# rho = power of the weight function used in the tests. By default is
#       set to 0.
# cencode = value of fstatus variable which indicates the failure time
#           is censored.
# subset = a logical vector specifying a subset of cases to include in
#          the analysis.
# na.action = a function specifying the action to take for any cases
#             missing any of ftime, fstatus, group, strata, or subset.
#             By default missing cases are omitted.
# level = a value in the range [0,1] specifying the level for pointwise
#         confidence bands.
# xlab = text for the x-axis label.
# ylab = text for the y-axis label.
# col = color(s) used for plotting curves (see plot.default).
# lty = line type(s) used for plotting curves (see plot.default).
# lwd = line width(s) used for plotting curves (see plot.default).
# digits = number of significant digits used for printing values. By
#          default set at 4.
#
#############################################################################

"CumIncidence" <- function(ftime, fstatus, group, t, strata, rho = 0,
cencode = 0, subset, na.action = na.omit, level,
xlab = "Time", ylab = "Probability",
col, lty, lwd, digits = 4)
{
# check for the required package
if(!require("cmprsk"))
{ stop("Package `cmprsk' is required and must be installed.\n
See help(install.packages) or write the following command at prompt
and then follow the instructions:\n
> install.packages(\"cmprsk\")") }
# collect data
mf  <- match.call(expand.dots = FALSE)
mf[[1]] <- as.name("list")
mf$t <- mf$digits <- mf$col <- mf$lty <- mf$lwd <- mf$level <-
mf$xlab <- mf$ylab <- NULL
mf <- eval(mf, parent.frame())
g <- max(1, length(unique(mf$group))) s <- length(unique(mf$fstatus))
if(missing(t))
{ time <- pretty(c(0, max(mf$ftime)), 6) ttime <- time <- time[time < max(mf$ftime)] }
else { ttime <- time <- t }
# fit model and estimates at time points
fit   <- do.call("cuminc", mf)
tfit <- timepoints(fit, time)

# print result
cat("\n+", paste(rep("-", 67), collapse=""), "+", sep ="")
cat("\n| Cumulative incidence function estimates from competing risks data |")
cat("\n+", paste(rep("-", 67), collapse=""), "+\n", sep ="")
tests <- NULL
if(g > 1)
{ tests <- fit$Tests colnames(tests) <- c("Statistic", "p-value", "df") cat("Test equality across groups:\n") print(tests, digits = digits) } cat("\nEstimates at time points:\n") print(tfit$est, digits = digits)
cat("\nStandard errors:\n")
print(sqrt(tfit$var), digits = digits) # if(missing(level)) { # plot cumulative incidence functions if(missing(t)) { time <- sort(unique(c(ftime, time))) x <- timepoints(fit, time) } else x <- tfit col <- if(missing(col)) rep(1:(s-1), rep(g,(s-1))) else col lty <- if(missing(lty)) rep(1:g, s-1) else lty lwd <- if(missing(lwd)) rep(1, g*(s-1)) else lwd matplot(time, base::t(x$est)[,1:2], type="s", ylim = c(0,1),
xlab = xlab, ylab = ylab, xaxs="i", yaxs="i",
col = col, lty = lty, lwd = lwd)
legend("topleft", legend =  rownames(x$est)[1:2], x.intersp = 2, bty = "n", xjust = 1, col = col, lty = lty, lwd = lwd) out <- list(test = tests, est = tfit$est, se = sqrt(tfit$var)) } else { if(level < 0 | level > 1) error("level must be a value in the range [0,1]") # compute pointwise confidence intervals oldpar <- par(ask=TRUE) on.exit(par(oldpar)) if(missing(t)) { time <- sort(unique(c(ftime, time))) x <- timepoints(fit, time) } else x <- tfit z <- qnorm(1-(1-level)/2) lower <- x$est ^ exp(-z*sqrt(x$var)/(x$est*log(x$est))) upper <- x$est ^ exp(z*sqrt(x$var)/(x$est*log(x$est))) col <- if(missing(col)) rep(1:(s-1), rep(g,(s-1))) else rep(col, g*(s-1)) lwd <- if(missing(lwd)) rep(1, g*(s-1)) else rep(lwd, g*(s-1)) # plot pointwise confidence intervals for(j in 1:nrow(x$est))
{ matplot(time, cbind(x$est[j,], lower[j,], upper[j,]), type="s", xlab = xlab, ylab = ylab, xaxs="i", yaxs="i", ylim = c(0,1), col = col[j], lwd = lwd[j], lty = c(1,3,3)) legend("topleft", legend = rownames(x$est)[j], bty = "n", xjust = 1) }
# print pointwise confidence intervals
i <- match(ttime, time)
ci <- array(NA, c(2, length(i), nrow(lower)))
ci[1,,] <- base::t(lower[,i])
ci[2,,] <- base::t(upper[,i])
dimnames(ci) <- list(c("lower", "upper"), ttime, rownames(lower))
cat(paste("\n", level*100, "% pointwise confidence intervals:\n\n", sep=""))
print(ci, digits = digits)
out <- list(test = tests, est = x$est, se = sqrt(tfit$var), ci = ci)
}
# return results
invisible(out)
}

dat <- structure(list(dis = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), ftime = c(811L, 1595L, 810L, 362L, 1144L, 856L, 2923L, 741L,
927L, 672L, 1272L, 1502L, 2137L, 1388L, 2617L, 536L, 2496L, 367L,
994L, 144L, 461L, 752L, 730L, 277L, 477L, 180L, 992L, 1421L,
3218L, 1447L, 458L, 456L, 2332L, 338L, 903L, 290L, 1064L, 892L,
96L, 285L, 251L, 181L, 189L, 1295L, 548L, 281L, 264L, 1051L,
590L, 190L, 2395L, 462L, 92L, 336L, 1969L, 2157L, 336L, 213L),
status = c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L)), .Names = c("dis", "ftime", "status"), row.names = c(NA,
-58L), class = "data.frame")

dat$dis <- factor(dat$dis, levels=c(0,1), labels= c("AML", "ALL"))

fit <-  with(dat, CumIncidence(ftime, status, dis, cencode = 0, xlab="Days"))

fit

#### lilchaos

##### New Member
Hey well the snippet i gave you should be used right after bmt=read.csv(file.choose()) so that whatever calculations you do is done only with the subset you want.

But i guess trinkers solution might be better

#### obruck

##### New Member
HUGE THANKS to both of you, especially trinker. This solved the issue (and saved my day)!

#### obruck

##### New Member
Hi!

I encountered two challenges that I couldn't solve by myself and thought that maybe you would know what to do.

1. Plots to continue horizontally
So as you see in the plot you posted as an answer to the thread (see link above), the lower line promptly ends after the last value. Could it be made to continue horizontally after the last value?

2. Having 3 groups instead of 2
I tried also having 3 groups instead of 2 by coding three different diseases in the .xlsx file (0, 1 or 2). However, I manage to do the analysis and plots only for 2 groups. Do you know what is the issue with this?

Thank you very much for your help in advance!

#### lilchaos

##### New Member
Hey,

genreally it would be helpful if you could post your code again, this saves everyone going through the thread and imagining what your data now looks like.

unfortunately i cant see the picture so ill leave question 1 to someone else.

for question 2: have you changed your dataset, eg did you generate new columns?

#### obruck

##### New Member
Hi!

Thank you for your answer. I have attached an excel file with some example data.

Column A ("dis"): patient disease. 0 = ALL, 1 = AML, 2 = CML.
Column B ("ftime"): follow-up time in days.
Column C ("status"): disease status. 0 = remission, 1 = no remission.

So I would like to do cumulative incidence status using this script http://www.stat.unipg.it/luca/R/CumIncidence.R as a function. The script in RStudio I write manually is
# Read the data file
> bmt2=read.csv(file.choose())
# Attach the data file to be able to work with the variable names
> attach(bmt2)
# Open the R package "Cumulative incidence"
> source (file.choose())
# Do the analysis
fit=CumIncidence (ftime, status, dis, cencode = 0, xlab="Days")

However, I would like to change results like this:

1. Plots to continue horizontally
Plots promptly ends after the last value (see trinker's answer above). Could it be made to continue horizontally after the last value?

2. Having 3 groups instead of 2
I tried also having 3 groups (Column A "dis") instead of 2 by coding three different diseases in the .xlsx file. However, I manage to do the analysis and plots only for 2 groups. Do you know what is the issue with this?

Thank you very much for your help in advance!

Last edited: