Search code examples
rrvesthttr

Clicking a submit link in rvest


I am trying to scrape data from a website using rvest. I read in the html of the page and then extract the form. Thereafter I make changes in the form using rvest::html_form_set and then submit it. After looking at the form, I realized there is no submit button. The button available on the website is an anchor tag with a href to a script. I tried using rvest::session_follow_link() but am unable to get the data. This is the code that doesn't work:

trademark_search_page <- rvest::session('https://ipindiaonline.gov.in/tmrpublicsearch/frmmain.aspx')
      search_form <-  rvest::html_form(trademark_search_page)[[1]]

      search_form <- search_form %>% rvest::html_form_set(`ctl00$ContentPlaceHolder1$TBWordmark` = 'Bull',
                                                          `ctl00$ContentPlaceHolder1$TBClass` = 32)

      resp <- trademark_search_page %>% rvest::session_submit(search_form) %>% 
        rvest::session_follow_link(xpath = '//a[@id = "ContentPlaceHolder1_BtnSearch"]')

Any suggestions on what I should be doing?


Solution

  • I think it might be tricky to do with rvest because the button references a javascript script. If you're open to other tools, here's how to do it with RSelenium

    # load libraries
    library(RSelenium)
    
    # define url ---------------------------------------------------------
    url <- "https://ipindiaonline.gov.in/tmrpublicsearch/frmmain.aspx"
    
    
    # define search terms ------------------------------
    word_mark <- "Bull"
    class_search_term <- "32"
    
    # start RSelenium ------------------------------------------------------------
    
    rD <- rsDriver(browser="firefox", port=4548L, chromever = NULL)
    remDr <- rD[["client"]]
    
    # Navigate to webpage -----------------------------------------------------
    remDr$navigate(url)
    
    
    # fill in the form ------------------------------------------------
    # this finds the html element for each part of the form
    # and fills it in with the value we want
    
    # Wordmark
    remDr$findElement(using = "id", value = "ContentPlaceHolder1_TBWordmark")$sendKeysToElement(list(word_mark))
    
    # Class
    remDr$findElement(using = "id", value = "ContentPlaceHolder1_TBClass")$sendKeysToElement(list(class_search_term))
    
    
    # click submit button ---------------------------------------
    
    remDr$findElements("id", "ContentPlaceHolder1_BtnSearch")[[1]]$clickElement()
    
    

    Here's what the page that leads to looks like: search results

    After you get to this page you can get list of the more details links using rvest

    library(rvest)
    library(magrittr)
    
    # pull html from page
    html <- remDr$getPageSource()[[1]]
    
    # find all the html elements with the .LnkshowDetails class
    
    more_details_butons <- html %>% read_html() %>% 
      html_nodes(".LnkshowDetails") %>%
      html_attr("id")
    
    

    then you could loop though all the buttons and click on them or pull data