Extract unique ID from table on webpage using rvest

I am trying to extract the unique ID (Package Id) for each of the 147 data packages on the Environmental Data Initiative (EDI) website for site Andrews LTER. However, I can't figure out which rvest::html_nodes() holds the Package Id. Any ideas?

What I've been trying:

# Load required libraries
library(rvest)
library(dplyr)

# Define the URL of the website
url <- "http://portal.edirepository.org:80/nis/simpleSearch?defType=edismax&q=*:*&fq=-scope:ecotrends&fq=-scope:lter-landsat*&fq=scope:(knb-lter-and)&fl=id,packageid,title,author,organization,pubdate,coordinates&debug=false"

# Read the HTML content from the website
page <- read_html(url)

# Extract the relevant information
packageIds <- page %>%
  html_nodes("td[class='Package Id']") %>%
  html_text() # results in an empty character string

Solution

You could try something like this. It was a bit tricky since I needed to append the original query with &start=0&rows=150 in order to load the full table.

Then you can use html_table to return contents which in this case was a list. Then select the actual table list element and select the Package Id col.

# Define the URL of the website
url <- "https://portal.edirepository.org/nis/simpleSearch?defType=edismax&q=*:*&fq=-scope:ecotrends&fq=-scope:lter-landsat*&fq=scope:(knb-lter-and)&fl=id,packageid,title,author,organization,pubdate,coordinates&debug=false&start=0&rows=150"

# Read the HTML content from the website
page <- read_html(url)

# Extract the relevant information
page %>%
  html_table() %>%
  .[[4]] %>%
  select(`Package Id  ▵▿`) %>%
  rename(package_id = `Package Id  ▵▿`)

# A tibble: 147 × 1
   package_id    
   <chr>               
 1 knb-lter-and.2719.6 
 2 knb-lter-and.2720.8 
 3 knb-lter-and.2721.6 
 4 knb-lter-and.2722.6 
 5 knb-lter-and.2725.6 
 6 knb-lter-and.2726.6 
 7 knb-lter-and.4528.10
 8 knb-lter-and.4541.3 
 9 knb-lter-and.4544.4 
10 knb-lter-and.4547.5 
# … with 137 more rows