Search code examples
rggplot2graphmissing-datatimeserieschart

time series plot for missing data


I have some sequence event data for which I want to plot the trend of missingness on value across time. Example below:

        id    time  value
1    aa122      1     1
2   aa2142      1     1
3   aa4341      1     1
4    bb132      1     2
5   bb2181      2     1
6   bb3242      2     3
7   bb3321      2     NA
8    cc122      2     1
9   cc2151      2     2
10  cc3241      3     1
11   dd161      3     3
12  dd2152      3     NA
13  dd3282      3     NA
14   ee162      3     1
15  ee2201      4     2
16  ee3331      4     NA
17  ff1102      4     NA
18  ff2141      4     NA
19  ff3232      5     1
20   gg142      5     3
21  gg2192      5     NA
22  gg3311      5     NA
23  gg4362      5     NA
24   ii111      5     NA

The NA suppose to increase over time (the behaviors are fading). How do I plot the NA across time


Solution

  • I think this is what you're looking for? You want to see how many NA's appear over time. Assuming this is correct, if each time is a group, then you can count the number of NA's appear in each group

    data:

    df <- structure(list(id = structure(1:24, .Label = c("aa122", "aa2142", 
    "aa4341", "bb132", "bb2181", "bb3242", "bb3321", "cc122", "cc2151", 
    "cc3241", "dd161", "dd2152", "dd3282", "ee162", "ee2201", "ee3331", 
    "ff1102", "ff2141", "ff3232", "gg142", "gg2192", "gg3311", "gg4362", 
    "ii111"), class = "factor"), time = c(1L, 1L, 1L, 1L, 2L, 2L, 
    2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 
    5L, 5L), value = c(1L, 1L, 1L, 2L, 1L, 3L, NA, 1L, 2L, 1L, 3L, 
    NA, NA, 1L, 2L, NA, NA, NA, 1L, 3L, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
    -24L))
    
    library(tidyverse)
    library(ggplot2)
    df %>% 
      group_by(time) %>% 
      summarise(sumNA = sum(is.na(value)))
    
    # A tibble: 5 × 2
       time sumNA
      <int> <int>
    1     1     0
    2     2     1
    3     3     2
    4     4     3
    5     5     4
    

    You can then plot this using ggplot2

    df %>% 
      group_by(time) %>% 
      summarise(sumNA = sum(is.na(value))) %>%
      ggplot(aes(x=time)) +
      geom_line(aes(y=sumNA))
    

    example

    As you can see, as time increases, the number of NA's also increases