I would like to scrape links to content from this website:
https://www.forklift-international.com/en/for-sale/forklift-battery
Obviously the link is Java script generated, but after inspecting the page code I can see that the url pattern is present there. Here is important part of the code for the first link:
<div class="card cardhighlight0 cpointer" itemscope="" itemtype="http://schema.org/Product" onclick="window.location='/en/e/Battery-used-RoyPow-S2450-12003t'">
In this particular example I need to extract this code, which I can use to assemble the final url:
/en/e/Battery-used-RoyPow-S2450-12003t
Problem is that I am not able to read this in. I do the following:
response <- GET("https://www.forklift-international.com/en/for-sale/forklift-battery")
page <- read_html(response)
page_links <- page %>% html_elements(".card.cardhighlight0") %>% html_text2()
In next steps I would to extract the pattern from the text by regex, but I never got to this point, because the needed pattern is not included in the parsed text. As I understand, it is a part of the tag and it is not being picked up by rvest. Any suggestions how to deal with this please?
You can access element attributes through rvest::html_attr()
:
library(rvest)
page <- read_html("https://www.forklift-international.com/en/for-sale/forklift-battery")
page %>%
html_elements(".card.cardhighlight0") %>%
html_attr("onclick")
#> [1] "window.location='/en/e/Battery-used-RoyPow-S2450-12003t'"
#> [2] "window.location='/en/e/Battery-used-RoyPow-S24160-12002t'"
#> [3] "window.location='/en/e/Battery-used-RoyPow-F80420A-12001t'"
#> [4] "window.location='/en/e/Battery-used-RoyPow-F48560X-12000t'"
#> [5] "window.location='/en/e/Battery-used-RoyPow-F24160-11999t'"
#> [6] "window.location='/en/e/Battery-used-GRUMA-48-Volt-4-PzS-620-Ah-11998t'"
#> [7] "window.location='/en/e/Battery-used-IBB-24-Volt-3-PzB-225-Ah-11996t'"
#> [8] "window.location='/en/e/Battery-used-Linde-24-Volt-3-PzS-375-Ah-11997t'"
#> [9] "window.location='/en/e/Battery-used-Hoppecke-48V-4-HPzS-500-11993t'"
#> [10] "window.location='/en/e/Battery-used-IBH-IBG-Smart-Low-Antimon-11992t'"
#> [11] "window.location='/en/e/Battery-used-GRUMA-24-Volt-8-PzS-1000-Ah-11991t'"
#> [12] "window.location='/en/e/Battery-used-Hoppecke-24V-3-HPzS-375-11990t'"
#> [13] "window.location='/en/e/Battery-used-IBV-24-Volt-4-PzS-620-Ah-11987t'"
#> [14] "window.location='/en/e/Battery-used-IBV-24-Volt-4-PzS-620-Ah-11988t'"
#> [15] "window.location='/en/e/Battery-used-IBV-24-Volt-4-PzS-620-Ah-11989t'"
#> [16] "window.location='/en/e/Battery-used-GRUMA-48-Volt-4-PzS-620-Ah-11985t'"
#> [17] "window.location='/en/e/Battery-used-%5Bdiv%5D-3-EPzS-465-11853t'"
#> [18] "window.location='/en/e/Battery-used-GRUMA-48-Volt-5-PzS-625-Ah-11981t'"
#> [19] "window.location='/en/e/Battery-used-AIM-48-Volt-5-PzS-775-Ah-11980t'"
#> [20] "window.location='/en/e/Battery-used-GRUMA-24-Volt-2-PzS-250-Ah-11979t'"
Created on 2024-01-12 with reprex v2.0.2