Search code examples
rweb-scrapingrvest

scraping data table from clinicaltrials.gov with rvest


I'd like to scrape this data table when I put in search terms on clinicaltrials.gov. Specifically, I'd like to scrape the table you see on this page: https://clinicaltrials.gov/ct2/results?term=nivolumab+AND+Overall+Survival. See below for screenshot:

enter image description here

I've tried this code, but I don't think I got the right css selector:

# create custom url
ctgov_url <- "https://clinicaltrials.gov/ct2/results?term=nivolumab+AND+Overall+Survival"
# read HTML page
ct_page <- rvest::read_html(ctgov_url)

# extract related terms
ct_page %>%
  # find elements that match a css selector
  rvest::html_element("t") %>%
  # retrieve text from element (html_text() is much faster than html_text2())
  rvest::html_table()

Solution

  • You don't need rvest here at all. The page provides a download button to get a csv of the search items. This has a basic url-encoded GET syntax which allows you to create a simple little API:

    get_clin_trials_data <- function(terms, n = 1000) {
      
      terms<- URLencode(paste(terms, collapse = " AND "))
    
      df <- read.csv(paste0(
        "https://clinicaltrials.gov/ct2/results/download_fields",
        "?down_count=", n, "&down_flds=shown&down_fmt=csv",
        "&term=", terms, "&flds=a&flds=b&flds=y"))
    
      dplyr::as_tibble(df)
    }
    

    This allows you to pass in a vector of search terms and a maximum number of results to return. No need for complex parsing as would be required with web scraping.

    get_clin_trials_data(c("nivolumab", "Overall Survival"), n = 10)
    #> # A tibble: 10 x 8
    #>     Rank Title     Status Study.Results Conditions Interventions Locations URL  
    #>    <int> <chr>     <chr>  <chr>         <chr>      <chr>         <chr>     <chr>
    #>  1     1 A Study ~ Compl~ No Results A~ Hepatocel~ ""            "Bristol~ http~
    #>  2     2 Nivoluma~ Activ~ No Results A~ Glioblast~ "Drug: Nivol~ "Duke Un~ http~
    #>  3     3 Nivoluma~ Unkno~ No Results A~ Melanoma   "Biological:~ "CHU d'A~ http~
    #>  4     4 Study of~ Compl~ Has Results   Advanced ~ "Biological:~ "Highlan~ http~
    #>  5     5 A Study ~ Unkno~ No Results A~ Brain Met~ "Drug: Fotem~ "Medical~ http~
    #>  6     6 Trial of~ Compl~ Has Results   Squamous ~ "Drug: Nivol~ "Stanfor~ http~
    #>  7     7 Nivoluma~ Compl~ No Results A~ MGMT-unme~ "Drug: Nivol~ "New Yor~ http~
    #>  8     8 Study of~ Compl~ Has Results   Squamous ~ "Biological:~ "Mayo Cl~ http~
    #>  9     9 Study of~ Compl~ Has Results   Non-Squam~ "Biological:~ "Mayo Cl~ http~
    #> 10    10 An Open-~ Unkno~ No Results A~ Squamous-~ "Drug: Nivol~ "IRCCS -~ http~
    

    Created on 2022-06-21 by the reprex package (v2.0.1)