Search code examples
rdataframeplothistogram

How to graph all the occurences of NA by bins of positions


I am a little new to R, and I am looking to graph all the occurrences of NA over the change in position as a histogram. I am starting to use ggplot but I'm not sure where to start?

df <- data.frame(position=c(3, 5, 6, 7, 16, 17, 19, 20, 21, 35, 46, 78, 98, 100, 
                            145, 146, 147), 
                 petal15=c(0, 1, 1, 1, 0, NA, 1, NA, NA, NA, 1, 0, 0, 1, NA, 
                           0, NA), 
                 petal20=c(1, 0, 0, 1, 1, 1, 1, 1, NA, NA, 1, 1, NA, NA, 0, 
                           0, 0))

Solution

  • If I understood you correctly you are looking for this.

    hist(rowSums(is.na(df)), xlab='occurrences', col=4, main="NA's")
    

    enter image description here

    Edit

    To implement your idea in the comment, we first want a list of the positions.

    (pos <- lapply((0:14)*10 + 1, \(x) (0:9) + x))
    # [[1]]
    # [1]  1  2  3  4  5  6  7  8  9 10
    # 
    # [[2]]
    # [1] 11 12 13 14 15 16 17 18 19 20
    #
    # [...]
    #
    # [[15]]
    # [1] 141 142 143 144 145 146 147 148 149 150
    

    Using sapply we create the desired matrix and pipe it to barplot().

    sapply(pos, \(i) colSums(is.na(df[df$position %in% i, -1]))) |>
      barplot(beside=TRUE, xlim=c(1, 48), ylim=c(0, 2.5), col=3:4,
              xlab='Occurences', ylab='Freq', main="NA's",
              legend.text=names(df[-1]), args.legend=list(x='topright'),
              names.arg=Reduce(\(x, y) paste0(x, '-', y), 
                               as.data.frame(t(sapply(pos, range)))),
              cex.names=.8) 
    box()
    

    enter image description here

    Note:

    R.version.string
    # [1] "R version 4.1.2 (2021-11-01)"
    

    Data:

    df <- structure(list(position = c(3, 5, 6, 7, 16, 17, 19, 20, 21, 35, 
    46, 78, 98, 100, 145, 146, 147), petal15 = c(0, 1, 1, 1, 0, NA, 
    1, NA, NA, NA, 1, 0, 0, 1, NA, 0, NA), petal20 = c(1, 0, 0, 1, 
    1, 1, 1, 1, NA, NA, 1, 1, NA, NA, 0, 0, 0)), class = "data.frame", row.names = c(NA, 
    -17L))