Plots to different charts

#1
Hi!

My situation is very much the same as in this image http://www.nature.com/bmt/journal/v40/n4/fig_tab/1705727f1.html#figure-title

I am comparing patients from two diseases (ALL, AML) with two different disease status (1,2). I am trying to calculate with cumulative incidence how many ALL and AML patients achieve the value 1 from all ALL and AML patients, respectively. However, the chart displays both values so I get 4 plots instead of 2.

What should I do to have a plot with "ALL1" and "AML1" only?

Thank you in advance for your help!
 
#2
Not sure if i understood you correctly but maybe the easiest and fastest would be to use a subset where Disease status is only 1?

something like:

data.sub <- subset(data, data$diesease.Status == 1)
 
#3
Thank you for the tip! It seems to me that you understood my point. However, I did not manage to get this subset to work.

So I have my data in 3 columns named "dis" as the disease type (ALL or AML), ftime as time and status as the status state (0 = Censored, 1 = Remission, 2 = Non-remission). I want to the analysis to take both disease types and Remission vs. Non-remission status states into account. However, I want only plot the status "Remission" according to the disease (ALL or AML) and not the non-remission data.

This is the script
# Read the data file
> bmt=read.csv(file.choose())
# Attach the data file to be able to work with the variable names
> attach(bmt3)
# Instead of number codes, the corresponding label may be displayed in the output
> dis=factor(dis, levels=c(0,1), labels= c("AML”, “ALL"))
# Open the R package "Cumulative incidence"
> source (file.choose())
# Do the analysis
fit=CumIncidence (ftime, status, dis, cencode = 0, xlab="Days")

AND here I got the image as in the article above. I guess I need to plot only the subset at this point. What should I do to subset only AML and ALL patients that have status = 1 ("Remission")?

Thank in advance!
 
#4
Hey,

to be honest i have never worked with a script as you displayed...i guess you did not code it by yourself?

I guess your data will be bmt, right? Can you make a sample table to see the structure of bmt?

i would say:

bmt <- subset(bmt, bmt$status == 1), should work but no guarantee.
 
#5
The creator of that script is actually the author of the nature I posted. I happened to need the same script. And I happen also to have very basic coding skills so I appreciate all help I get!

I tried the code you suggested and added at the very end. R definitely understood it but it did nothing to the plot. Should there be some extra coding to do to draw the new plot?

Thanks again for your interest.

As a possible alternative I came to think if something should be altered in the original Cum.Incidence coding http://www.stat.unipg.it/luca/R/CumIncidence.R ?
 

trinker

ggplot2orBust
#6
This question would be improved if you told use where CumIncidence came from and used dput to share code (as I do below) not attaching a csv/xlsx. I googled and got the following script for the function you use but don't include: http://www.stat.unipg.it/luca/R/CumIncidence.R and so used that in my answer below. If I understood the problem more and I were you I'd do this myself and make it tailored to my own needs maybe with ggplot2's geom_step function.

Here's my plot:



All I have done is tweaked the following lines of code in the function. FUll code and reproducible data example below that (changes in red):

Code:
      matplot(time, base::t(x$est)[COLOR="red"][,1:2][/COLOR], type="s", ylim = c(0,1), 
              xlab = xlab, ylab = ylab, xaxs="i", yaxs="i", 
              col = col, lty = lty, lwd = lwd)
      legend("topleft", legend =  rownames(x$est)[COLOR="red"][1:2][/COLOR], x.intersp = 2, 
             bty = "n", xjust = 1, col = col, lty = lty, lwd = lwd)
Full code:

Code:
#############################################################################
#                                                                           #
#                 CUMULATIVE INCIDENCE CURVES IN R                          #
#                                                                           #
# Written by Luca Scrucca                                                   #
#                                                                           #
# Reference:                                                                #
# Scrucca L., Santucci A., Aversa F. (2007) Competing risks analysis using  #
#   R: an easy guide for clinicians. Bone Marrow Transplantation, 40,       #
#   381--387.                                                               #
#############################################################################
# ver. 1.1 Feb 2008
#          - allow group to be missing
#          - if t is provided both computation and plots use t as time points
#          - allow col, lwd to be used for curves with confidence bands
#          - fix some bugs in the legend
#          - added help on source code
# ver. 1.0 May 2007
#          - Version appearing in the BMT paper
#############################################################################
#
# Usage:
# 
#   CumIncidence(ftime, fstatus, group, t, strata, rho = 0, cencode = 0,
#             	 subset, na.action = na.omit, level, 
#                xlab = "Time", ylab = "Probability", 
#                col, lty, lwd, digits = 4)
# 
# Arguments:
# 
# ftime = failure time variable.
# fstatus = variable with distinct codes for different causes of 
#           failure and also a distinct code for censored observations.
# group	= estimates will be calculated within groups given by distinct 
#         values of this variable. Tests will compare these groups. If 
#         missing then treated as all one group (no test statistics).
# t = a vector of time points where the cumulative incidence function 
#     should be evaluated.
# strata = stratification variable. Has no effect on estimates. Tests 
#          will be stratified on this variable. (all data in 1 stratum,
#          if missing).
# rho = power of the weight function used in the tests. By default is 
#       set to 0.
# cencode = value of fstatus variable which indicates the failure time
#           is censored.
# subset = a logical vector specifying a subset of cases to include in 
#          the analysis.
# na.action = a function specifying the action to take for any cases 
#             missing any of ftime, fstatus, group, strata, or subset. 
#             By default missing cases are omitted.
# level = a value in the range [0,1] specifying the level for pointwise
#         confidence bands.
# xlab = text for the x-axis label.
# ylab = text for the y-axis label.
# col = color(s) used for plotting curves (see plot.default).
# lty = line type(s) used for plotting curves (see plot.default).
# lwd = line width(s) used for plotting curves (see plot.default).
# digits = number of significant digits used for printing values. By 
#          default set at 4.
# 
#############################################################################

"CumIncidence" <- function(ftime, fstatus, group, t, strata, rho = 0, 
	                         cencode = 0, subset, na.action = na.omit, level,
	                         xlab = "Time", ylab = "Probability", 
	                         col, lty, lwd, digits = 4)
{
  # check for the required package
  if(!require("cmprsk"))
    { stop("Package `cmprsk' is required and must be installed.\n 
           See help(install.packages) or write the following command at prompt
           and then follow the instructions:\n
           > install.packages(\"cmprsk\")") } 
  # collect data
  mf  <- match.call(expand.dots = FALSE)
  mf[[1]] <- as.name("list")
  mf$t <- mf$digits <- mf$col <- mf$lty <- mf$lwd <- mf$level <- 
  mf$xlab <- mf$ylab <- NULL
  mf <- eval(mf, parent.frame())
  g <- max(1, length(unique(mf$group)))
  s <- length(unique(mf$fstatus))
  if(missing(t)) 
    { time <- pretty(c(0, max(mf$ftime)), 6)
      ttime <- time <- time[time < max(mf$ftime)] }
  else { ttime <- time <- t }
  # fit model and estimates at time points
  fit   <- do.call("cuminc", mf)
  tfit <- timepoints(fit, time)

  # print result
  cat("\n+", paste(rep("-", 67), collapse=""), "+", sep ="")
  cat("\n| Cumulative incidence function estimates from competing risks data |")
  cat("\n+", paste(rep("-", 67), collapse=""), "+\n", sep ="")
  tests <- NULL
  if(g > 1)
    { tests <- fit$Tests
	    colnames(tests) <- c("Statistic", "p-value", "df")
      cat("Test equality across groups:\n")
      print(tests, digits = digits) }
  cat("\nEstimates at time points:\n")
  print(tfit$est, digits = digits)
  cat("\nStandard errors:\n")
  print(sqrt(tfit$var), digits = digits)
  #
  if(missing(level))
    { # plot cumulative incidence functions
      if(missing(t))
        { time <- sort(unique(c(ftime, time)))
          x <- timepoints(fit, time) }
      else x <- tfit
      col <- if(missing(col)) rep(1:(s-1), rep(g,(s-1))) else col
      lty <- if(missing(lty)) rep(1:g, s-1) else lty
      lwd <- if(missing(lwd)) rep(1, g*(s-1)) else lwd      
      matplot(time, base::t(x$est)[,1:2], type="s", ylim = c(0,1), 
              xlab = xlab, ylab = ylab, xaxs="i", yaxs="i", 
              col = col, lty = lty, lwd = lwd)
      legend("topleft", legend =  rownames(x$est)[1:2], x.intersp = 2, 
             bty = "n", xjust = 1, col = col, lty = lty, lwd = lwd)
      out <- list(test = tests, est = tfit$est, se = sqrt(tfit$var))
    }
  else
    { if(level < 0 | level > 1) 
        error("level must be a value in the range [0,1]")
      # compute pointwise confidence intervals
      oldpar <- par(ask=TRUE)
      on.exit(par(oldpar))
      if(missing(t))
        { time <- sort(unique(c(ftime, time)))
          x <- timepoints(fit, time) }
      else x <- tfit
      z <- qnorm(1-(1-level)/2)
      lower <- x$est ^ exp(-z*sqrt(x$var)/(x$est*log(x$est)))
      upper <- x$est ^ exp(z*sqrt(x$var)/(x$est*log(x$est)))
      col <- if(missing(col)) rep(1:(s-1), rep(g,(s-1))) 
             else             rep(col, g*(s-1))
      lwd <- if(missing(lwd)) rep(1, g*(s-1)) 
             else             rep(lwd, g*(s-1))      
      # plot pointwise confidence intervals
      for(j in 1:nrow(x$est))
         { matplot(time, cbind(x$est[j,], lower[j,], upper[j,]), type="s", 
                   xlab = xlab, ylab = ylab, xaxs="i", yaxs="i", 
                   ylim = c(0,1), col = col[j], lwd = lwd[j], lty = c(1,3,3))
           legend("topleft", legend =  rownames(x$est)[j], bty = "n", xjust = 1) }
      # print pointwise confidence intervals
      i <- match(ttime, time)
      ci <- array(NA, c(2, length(i), nrow(lower)))
      ci[1,,] <- base::t(lower[,i])
      ci[2,,] <- base::t(upper[,i])
      dimnames(ci) <- list(c("lower", "upper"), ttime, rownames(lower))
      cat(paste("\n", level*100, "% pointwise confidence intervals:\n\n", sep=""))
      print(ci, digits = digits)
      out <- list(test = tests, est = x$est, se = sqrt(tfit$var), ci = ci)
    }
  # return results
  invisible(out)
}

dat <- structure(list(dis = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
    ), ftime = c(811L, 1595L, 810L, 362L, 1144L, 856L, 2923L, 741L, 
    927L, 672L, 1272L, 1502L, 2137L, 1388L, 2617L, 536L, 2496L, 367L, 
    994L, 144L, 461L, 752L, 730L, 277L, 477L, 180L, 992L, 1421L, 
    3218L, 1447L, 458L, 456L, 2332L, 338L, 903L, 290L, 1064L, 892L, 
    96L, 285L, 251L, 181L, 189L, 1295L, 548L, 281L, 264L, 1051L, 
    590L, 190L, 2395L, 462L, 92L, 336L, 1969L, 2157L, 336L, 213L), 
        status = c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
        2L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
        1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 
        1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
        1L)), .Names = c("dis", "ftime", "status"), row.names = c(NA, 
    -58L), class = "data.frame")


dat$dis <- factor(dat$dis, levels=c(0,1), labels= c("AML", "ALL"))

fit <-  with(dat, CumIncidence(ftime, status, dis, cencode = 0, xlab="Days"))

fit
 
#7
Hey well the snippet i gave you should be used right after bmt=read.csv(file.choose()) so that whatever calculations you do is done only with the subset you want.

But i guess trinkers solution might be better ;)
 
#9
Hi!

I encountered two challenges that I couldn't solve by myself and thought that maybe you would know what to do.

1. Plots to continue horizontally
So as you see in the plot you posted as an answer to the thread (see link above), the lower line promptly ends after the last value. Could it be made to continue horizontally after the last value?

2. Having 3 groups instead of 2
I tried also having 3 groups instead of 2 by coding three different diseases in the .xlsx file (0, 1 or 2). However, I manage to do the analysis and plots only for 2 groups. Do you know what is the issue with this?

Thank you very much for your help in advance!
 
#10
Hey,

genreally it would be helpful if you could post your code again, this saves everyone going through the thread and imagining what your data now looks like.

unfortunately i cant see the picture so ill leave question 1 to someone else.

for question 2: have you changed your dataset, eg did you generate new columns?
 
#11
Hi!

Thank you for your answer. I have attached an excel file with some example data.

Column A ("dis"): patient disease. 0 = ALL, 1 = AML, 2 = CML.
Column B ("ftime"): follow-up time in days.
Column C ("status"): disease status. 0 = remission, 1 = no remission.

So I would like to do cumulative incidence status using this script http://www.stat.unipg.it/luca/R/CumIncidence.R as a function. The script in RStudio I write manually is
# Read the data file
> bmt2=read.csv(file.choose())
# Attach the data file to be able to work with the variable names
> attach(bmt2)
# Open the R package "Cumulative incidence"
> source (file.choose())
# Do the analysis
fit=CumIncidence (ftime, status, dis, cencode = 0, xlab="Days")

However, I would like to change results like this:

1. Plots to continue horizontally
Plots promptly ends after the last value (see trinker's answer above). Could it be made to continue horizontally after the last value?

2. Having 3 groups instead of 2
I tried also having 3 groups (Column A "dis") instead of 2 by coding three different diseases in the .xlsx file. However, I manage to do the analysis and plots only for 2 groups. Do you know what is the issue with this?

Thank you very much for your help in advance!
 
Last edited: