Search code examples
cssrrvestrselenium

CSS Selector found with rvest and not with RSelenium


The goal is to read the 1-5yr GIC rates for Guaranteed Investment Certificate - Long-Term and Compound Interest under the Non-Cashable GICs tab.

Selector Gadget tells me that the css identifier is #container-9565195e5e .cmp-chart__chart span. Using rvest:

page <- read_html('https://www.td.com/ca/en/personal-banking/products/saving-investing/gic-rates-canada/')
page %>% 
  html_nodes("#container-9565195e5e .cmp-chart__chart span") 

# {xml_nodeset (5)}
# [1] <span data-source="tdct-gic" data-view="single" data-filter-item="productId:315|minimumDepositAmt:0.01|minimumTermYearCnt:1" data-value="postedRate"></span>
#   [2] <span data-source="tdct-gic" data-view="single" data-filter-item="productId:315|minimumDepositAmt:0.01|minimumTermYearCnt:2" data-value="postedRate"></span>
#   [3] <span data-source="tdct-gic" data-view="single" data-filter-item="productId:315|minimumDepositAmt:0.01|minimumTermYearCnt:3" data-value="postedRate"></span>
#   [4] <span data-source="tdct-gic" data-view="single" data-filter-item="productId:315|minimumDepositAmt:0.01|minimumTermYearCnt:4" data-value="postedRate"></span>
#   [5] <span data-source="tdct-gic" data-view="single" data-filter-item="productId:315|minimumDepositAmt:0.01|minimumTermYearCnt:5" data-value="postedRate"></span>}

rvest can't read the actual rates because of the use of JavaScript on the site.

Turning to RSelenium using the same css selector results in an error:

remDr$navigate("https://www.td.com/ca/en/personal-banking/products/saving-investing/gic-rates-canada/")
webElem <- remDr$findElement(using = "css", "#container-9565195e5e .cmp-chart__chart span")

# Selenium message:Unable to locate element: {"method":"css selector","selector":"#container-9565195e5e .cmp-chart__chart span"}
# For documentation on this error, please visit: http://seleniumhq.org/exceptions/no_such_element.html
# Build info: version: '2.53.1', revision: 'a36b8b1', time: '2016-06-30 17:37:03'
# System info: host: 'ef4080d2cb73', ip: '172.17.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '5.4.0-135-generic', java.version: '1.8.0_91'
# Driver info: driver.version: unknown
# 
# Error:     Summary: NoSuchElement
# Detail: An element could not be located on the page using the given search parameters.
# class: org.openqa.selenium.NoSuchElementException
# Further Details: run errorDetails method

So how do I use RSelenium to read the 1-5yr rates for Guaranteed Investment Certificate - Long-Term and Compound Interest for Non-registered and Registered (TFSA, RSP, RIF, RESP)


Solution

  • Replaced RSelenium with Chromote (which is on its way to rvest: r4ds, gh). The selector in question seems to refer to another table, Long-Term and Simple Interest. While values are currently the same, still switched to the one mentioned in question.

    library(chromote)
    library(rvest)
    b <- ChromoteSession$new()
    # Display the current session in the Chromote browser:
    # b$view()
    
    b$Page$navigate("https://www.td.com/ca/en/personal-banking/products/saving-investing/gic-rates-canada/")
    b$Page$loadEventFired()
    
    # Non-Cashable GICs >> Guaranteed Investment Certificate - Long-Term and Compound Interest
    b$Runtime$evaluate("document.querySelector('#container-8a263227af table').outerHTML")$result$value %>% 
      minimal_html() %>% 
      html_element("table") %>% 
      html_table()
    #> # A tibble: 5 × 2
    #>   Term    `Non-registered and Registered (TFSA, RSP, RIF, RESP)`
    #>   <chr>   <chr>                                                 
    #> 1 1 year  4.65%                                                 
    #> 2 2 years 4.35%                                                 
    #> 3 3 years 3.75%                                                 
    #> 4 4 years 4%                                                    
    #> 5 5 years 4.05%
    
    ### Few alternatives
    # evalute js in runtime: 
    sapply(1:5, \(x) b$Runtime$evaluate(paste0("document.querySelector('[data-filter-item=\"productId:703|minimumDepositAmt:0.01|minimumTermYearCnt:",x,"\"]').innerText"))$result$value)
    #> [1] "4.65" "4.35" "3.75" "4"    "4.05"
    
    doc <- b$DOM$getDocument()
    # elements where "data-filter-item" attribute starts with "productId:703|minimumDepositAmt:0.01|minimumTermYearCnt:"
    nodeids <- b$DOM$querySelectorAll(doc$root$nodeId, '[data-filter-item^="productId:703|minimumDepositAmt:0.01|minimumTermYearCnt:"]')
    sapply(nodeids$nodeIds, \(x) b$DOM$getOuterHTML(x) %>% minimal_html() %>% html_text())
    #> [1] "4.65" "4.35" "3.75" "4"    "4.05"
    
    # close session
    b$close()
    #> [1] TRUE
    

    Created on 2023-01-21 with reprex v2.0.2