Search code examples
rversion

How do I tell how many people are still using older versions of R?


I'm evaluating whether it's worth retaining support for old versions of R in packages that I maintain, which adds a maintenance overhead. As such, I'd like to estimate how many people are still using R3.5.

This sort of data is easy to find for web browsers; and I know that RStudio compile download statistics for packages. But is there a comparable source of data for who is using (and thus presumably updating packages in) older versions of R?


Solution

  • It seems raw data is available from http://cran-logs.rstudio.com/ but the data is not grouped by R version. Here we can download the results for a particular day and see how many requests for packages came from different R versions.

    options(timeout = max(300, getOption("timeout")))
    
    day <- "2023-04-03"
    year <- as.POSIXlt(day)$year + 1900
    gzfile <- paste0(day, '.csv.gz')
    fileurl <- paste0('http://cran-logs.rstudio.com/', year, '/', gzfile)
    download.file(fileurl, gzfile)
    
    dd <- readr::read_csv(gzfile)
    
    library(dplyr)
    library(ggplot2)
    dd %>% 
      filter(!is.na(r_version) & r_version != "vosonSML") %>% 
      count(r_version) %>% 
      ggplot() +
      aes(r_version, n) + 
      geom_col() +
      coord_flip()
    

    R versions accessing CRAN on 4/3/2023

    This site only tracks requests that go to the RStudio CRAN mirror (which is the default so it's probably most requests) but it does ignore other CRAN mirrors. This summary also treats each requests as independent but it's likely that the same computer was installing more than on package on a given day due to package dependencies and such.

    It's clear that most people are at least running 4.0 but there is a long tail of versions. For a more representative sample, you will probably want to sample across different dates.