I am trying to extract the unique ID (Package Id
) for each of the 147 data packages on the Environmental Data Initiative (EDI) website for site Andrews LTER. However, I can't
figure out which rvest::html_nodes()
holds the Package Id. Any ideas?
What I've been trying:
# Load required libraries
library(rvest)
library(dplyr)
# Define the URL of the website
url <- "http://portal.edirepository.org:80/nis/simpleSearch?defType=edismax&q=*:*&fq=-scope:ecotrends&fq=-scope:lter-landsat*&fq=scope:(knb-lter-and)&fl=id,packageid,title,author,organization,pubdate,coordinates&debug=false"
# Read the HTML content from the website
page <- read_html(url)
# Extract the relevant information
packageIds <- page %>%
html_nodes("td[class='Package Id']") %>%
html_text() # results in an empty character string
You could try something like this. It was a bit tricky since I needed to append the original query with &start=0&rows=150
in order to load the full table.
Then you can use html_table
to return contents which in this case was a list. Then select the actual table list element and select
the Package Id col.
# Define the URL of the website
url <- "https://portal.edirepository.org/nis/simpleSearch?defType=edismax&q=*:*&fq=-scope:ecotrends&fq=-scope:lter-landsat*&fq=scope:(knb-lter-and)&fl=id,packageid,title,author,organization,pubdate,coordinates&debug=false&start=0&rows=150"
# Read the HTML content from the website
page <- read_html(url)
# Extract the relevant information
page %>%
html_table() %>%
.[[4]] %>%
select(`Package Id ▵▿`) %>%
rename(package_id = `Package Id ▵▿`)
# A tibble: 147 × 1
package_id
<chr>
1 knb-lter-and.2719.6
2 knb-lter-and.2720.8
3 knb-lter-and.2721.6
4 knb-lter-and.2722.6
5 knb-lter-and.2725.6
6 knb-lter-and.2726.6
7 knb-lter-and.4528.10
8 knb-lter-and.4541.3
9 knb-lter-and.4544.4
10 knb-lter-and.4547.5
# … with 137 more rows