I am using RSelenium to scrape data tables from a website. Iterated through many pages using a loop.
The code below, successfully scrapes the table in question (albeit looses UTFC formatting), however in some cases entries in the table have a "strike-through", in which case the code to ignores the strike through and acts is if it is not there.
Example:
Could anyone please help with how I may retain the strike through information when I scrape the table?
My code scraping table:
Data_table_html <- remDr$getPageSource()[[1]] %>%
read_html() %>%
html_table(header = FALSE, fill = TRUE)
I have spent hours on this, so any help or pointers would be immensely helpful,
I would like to share the solution I found, below. In short identifying nodes in HTML which have html_attr as "style" does the trick:
saving <- html_nodes((remDr$getPageSource()[[1]]), xpath='your xpath') %>% html_attr("style") %>% gsub("text-decoration:line-through;", "0", .) #%>% html_table(fill=TRUE)