Search code examples
rrvest

Extract unique ID from table on webpage using rvest


I am trying to extract the unique ID (Package Id) for each of the 147 data packages on the Environmental Data Initiative (EDI) website for site Andrews LTER. However, I can't figure out which rvest::html_nodes() holds the Package Id. Any ideas?

What I've been trying:

# Load required libraries
library(rvest)
library(dplyr)

# Define the URL of the website
url <- "http://portal.edirepository.org:80/nis/simpleSearch?defType=edismax&q=*:*&fq=-scope:ecotrends&fq=-scope:lter-landsat*&fq=scope:(knb-lter-and)&fl=id,packageid,title,author,organization,pubdate,coordinates&debug=false"

# Read the HTML content from the website
page <- read_html(url)

# Extract the relevant information
packageIds <- page %>%
  html_nodes("td[class='Package Id']") %>%
  html_text() # results in an empty character string

enter image description here


Solution

  • You could try something like this. It was a bit tricky since I needed to append the original query with &start=0&rows=150 in order to load the full table.

    Then you can use html_table to return contents which in this case was a list. Then select the actual table list element and select the Package Id col.

    # Define the URL of the website
    url <- "https://portal.edirepository.org/nis/simpleSearch?defType=edismax&q=*:*&fq=-scope:ecotrends&fq=-scope:lter-landsat*&fq=scope:(knb-lter-and)&fl=id,packageid,title,author,organization,pubdate,coordinates&debug=false&start=0&rows=150"
    
    # Read the HTML content from the website
    page <- read_html(url)
    
    # Extract the relevant information
    page %>%
      html_table() %>%
      .[[4]] %>%
      select(`Package Id  ▵▿`) %>%
      rename(package_id = `Package Id  ▵▿`)
    
    # A tibble: 147 × 1
       package_id    
       <chr>               
     1 knb-lter-and.2719.6 
     2 knb-lter-and.2720.8 
     3 knb-lter-and.2721.6 
     4 knb-lter-and.2722.6 
     5 knb-lter-and.2725.6 
     6 knb-lter-and.2726.6 
     7 knb-lter-and.4528.10
     8 knb-lter-and.4541.3 
     9 knb-lter-and.4544.4 
    10 knb-lter-and.4547.5 
    # … with 137 more rows