Search code examples

Calculate & Visualize "time to metastasis" in R

I am quite new to R and I find myself struggling with the following Issue.

I want to create a barplot with 2 bars that visualizes the time to metastasis in a cohort of 100 patients out which 40 developed metastasis. One bar represents the patients that have received immunotherapy and the 2. bar represents the patients that did not reiceive immunotherapy.

  • 100 patients
  • 40 developed metastasis over time
  • 30 received immunotherapy

I created an excel_file with the following columns:

  1. Metastasis (yes = 1, no = 0)
  2. time (if a metastasis was found (so 1 in the column before), any number (1,2,3,...)
  3. Immunotherapy (yes = 1, no = 0)

Basically, I have to

  1. Filter only the patients who developed metastasis (1)

  2. Create a barplot:

    • y-axis: TTM (time to metastasis)

    • x-axis: Immunotherapy (1 bar - yes, 1 bar - no)

I'd be thankful for any help,

kind regards

enter image description here

ggplot(data = M1, aes (x = Immunotherapy, y = Time))+ geom_col()


  • Update 2: See OP question2 in comments:

    We could use facet_wrap. I added a column with Cancer_Type to M1: The new data:

    M1 <- structure(list(Metastasis = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1), time = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 3, 5, 1, 3, 5, 8, 6, 7, 10, 7, 9, 9, 9, 3, 
    2, 5, 9, 4, 9, 6, 8, 1, 6, 9, 6, 10, 6, 6, 4, 7, 6, 10, 5, 5, 
    2, 9, 6, 1, 1, 2), Immunotherapy = c(0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1), Cancer_Type = c("Colon cancer", "Lung cancer", 
    "Colon cancer", "Lung cancer", "Colon cancer", "Colon cancer", 
    "Colon cancer", "Lung cancer", "Colon cancer", "Colon cancer", 
    "Colon cancer", "Lung cancer", "Colon cancer", "Lung cancer", 
    "Lung cancer", "Colon cancer", "Lung cancer", "Colon cancer", 
    "Colon cancer", "Lung cancer", "Colon cancer", "Lung cancer", 
    "Lung cancer", "Colon cancer", "Colon cancer", "Lung cancer", 
    "Lung cancer", "Colon cancer", "Lung cancer", "Lung cancer", 
    "Lung cancer", "Lung cancer", "Lung cancer", "Colon cancer", 
    "Colon cancer", "Colon cancer", "Colon cancer", "Lung cancer", 
    "Colon cancer", "Lung cancer", "Lung cancer", "Colon cancer", 
    "Colon cancer", "Colon cancer", "Lung cancer", "Lung cancer", 
    "Colon cancer", "Colon cancer", "Colon cancer", "Lung cancer", 
    "Lung cancer", "Colon cancer", "Lung cancer", "Colon cancer", 
    "Colon cancer", "Lung cancer", "Lung cancer", "Lung cancer", 
    "Colon cancer", "Lung cancer", "Colon cancer", "Lung cancer", 
    "Lung cancer", "Lung cancer", "Colon cancer", "Colon cancer", 
    "Lung cancer", "Lung cancer", "Colon cancer", "Lung cancer", 
    "Colon cancer", "Lung cancer", "Colon cancer", "Colon cancer", 
    "Colon cancer", "Colon cancer", "Lung cancer", "Colon cancer", 
    "Lung cancer", "Colon cancer", "Colon cancer", "Colon cancer", 
    "Lung cancer", "Colon cancer", "Colon cancer", "Colon cancer", 
    "Colon cancer", "Lung cancer", "Lung cancer", "Lung cancer", 
    "Colon cancer", "Colon cancer", "Colon cancer", "Lung cancer", 
    "Lung cancer", "Lung cancer", "Lung cancer", "Colon cancer", 
    "Colon cancer", "Lung cancer")), row.names = c(NA, -100L), class = "data.frame")

    The code with facet_wrap()

    M1 %>% 
      mutate(Immunotherapy = factor(as.character(Immunotherapy), labels = c("no", "yes"))) %>% 
      filter(Metastasis >=1) %>% 
      ggplot(aes(x = Immunotherapy, y = time, fill = Immunotherapy)) +
      geom_boxplot(outlier.shape = NA, alpha = 0.8, color = "black") +
      geom_jitter(width = 0.2, alpha = 0.5, size = 3) +
      scale_fill_manual(values = c("#E69F00", "#56B4E9")) +
      labs(x = "Immunotherapy", y = "Time to metastasis (months)") +
      scale_y_continuous(breaks = seq(0, ceiling(max(M1$time)), by = 2))

    enter image description here Update 1: OP question see comments:

    Question 1:

    • The lines above and below the box are the so called whiskers of the box-and-whisker plot.
    • They represent the range of data outside the box.
    • The length of the whiskers is typically 1.5 times the IQR.
    • Any data points outside the whiskers are considered outliers and are plotted as individual points.

    Question 2: I think you mean Immunotherapy and not metastasis:

    With this line: mutate(Immunotherapy = factor(as.character(Immunotherapy), labels = c("no", "yes")))

    We transform Immunotherapy (0,1) from numeric to factor. A factor variable has levels and labels. So we can assign to the factor Immunotherapy labels like we did in the above line e.g. "no" and "yes". And now is the key moment it is the order of the natural numbers. So 0 is before 1 and therefore "no" (which is in the line above before "yes") and "yes" is 1.

    First answer We could do something like this:

    # create fake data
    M1 <- data.frame(
      Metastasis = c(rep(0, 60), rep(1, 40)),
      time = c(rep(0, 60), sample(1:10, 40, replace = TRUE)),
      Immunotherapy = c(rep(0, 70), rep(1, 30))
    M1 %>% 
      mutate(Immunotherapy = factor(as.character(Immunotherapy), labels = c("no", "yes"))) %>% 
      filter(Metastasis >=1) %>% 
      ggplot(aes(x = Immunotherapy, y = time, fill = Immunotherapy)) +
      geom_boxplot(outlier.shape = NA, alpha = 0.8, color = "black") +
      geom_jitter(width = 0.2, alpha = 0.5, size = 3) +
      scale_fill_manual(values = c("#E69F00", "#56B4E9")) +
      labs(x = "Immunotherapy", y = "Time to metastasis (months)") +
      scale_y_continuous(breaks = seq(0, ceiling(max(M1$time)), by = 2))

    enter image description here