Search code examples
rhistogramfrequency

Range to histogram


I'm trying to build a histogram from my data. It's look like this: a data frame where in each row a data range. I need to get the histogram of all values in my df.

year <- c("1925:2002",
          "2008",
          "1925:2002",
          "1925:2002",
          "1925:2002",
          "2008:2013",
          "1934",
          "1972:1988")

All I was able to figure out is to convert every string to a sequence with seq() but it doesn't work properly

for (i in 1:length(year)) {
  rr[i] <- seq(
    as.numeric(unlist(strsplit(year[i], ":"))[1]),
    as.numeric(unlist(strsplit(year[i], ":"))[2])
  )
}

Here is an examplebase histogram


Solution

  • Tick the answer box for @MrFlick. I had done this at the same time and the only difference is the piping:

    library(magrittr)
    
    strsplit(year, ":") %>% 
      lapply(as.integer) %>% 
      lapply(function(x) seq(x[1], x[length(x)])) %>% 
      unlist() %>% 
      hist()
    

    Full-on tidyverse:

    library(tidyverse)
    
    str_split(year, ":") %>%
      map(as.integer) %>% 
      map(~seq(.x[1], .x[length(.x)])) %>% 
      flatten_int() %>% 
      hist()
    

    To defend my comments hence any tidyverse 4eva folks join in the fray:

    library(tidyverse)
    library(microbenchmark)
    
    microbenchmark(
      base = as.integer(
        unlist(
          lapply(
            lapply(
              strsplit(year, ":"),
              as.integer
            ),
            function(x) seq(x[1], x[length(x)])
          ),
          use.names = FALSE
        )
      ),
      tidy = str_split(year, ":") %>%
        map(as.integer) %>% 
        map(~seq(.x[1], .x[length(.x)])) %>% 
        flatten_int()
    )
    ## Unit: microseconds
    ##  expr     min      lq     mean   median       uq      max neval
    ##  base  89.099  96.699 132.1684 102.5895 110.7165 2895.428   100
    ##  tidy 631.817 647.812 672.5904 667.8250 686.2740  909.531   100