I'm trying to scrape prices from Bloomberg. I can get the current price as shown below but can't get the previous price. What's the wrong?
library(rvest)
url <- "https://www.bloomberg.com/quote/WORLD:IND"
price <- read_html(url) %>%
html_nodes("div.overviewRow__66339412a5 span.priceText__06f600fa3e") %>%
html_text()
prevprice <- read_html(url) %>%
html_nodes("div.value__7e29a7c90d") %>%
html_text() #returns 0
prevprice <- read_html(url) %>%
html_nodes(xpath = '//section') %>%
html_text() %>%
as.data.frame() #didn't find the price
Thanks in advance.
So, there are at least two initial options:
I show both of the above options in the code below.
I've also adapted the css selector list to use attribute = value css selectors, with starts with operator (^). This is to make the code more robust as the classes in the html appear to be dynamic, with only the start of the class attribute value being stable.
library(httr2)
library(tidyverse)
library(rvest)
url <- "https://www.bloomberg.com/quote/WORLDT:IND"
headers <- c("user-agent" = "mozilla/5.0")
page <- request(url) |>
(\(x) req_headers(x, !!!headers))() |>
req_perform() |>
resp_body_html()
# extract direct
prev_price <- page |>
html_text() |>
stringr::str_match("previousClosingPriceOneTradingDayAgo%22%3A(\\d+\\.?\\d+?)%2C") |>
.[, 2]
curr_price <- page |>
html_element("[class^=priceText]") |>
html_text() |>
str_replace_all(",", "") |>
as.numeric()
# calculate
change <- page |>
html_element("[class^=changePercent]") |>
html_text() |>
str_extract("[\\d\\.]+") |>
as.numeric()
adjustment <- 100 - change
prev_price_calculated <- curr_price * (adjustment / 100)
print(curr_price)
print(change)
print(prev_price)
print(prev_price_calculated)