I am trying to scrape data from a website using rvest
. I read in the html of the page and then extract the form. Thereafter I make changes in the form using rvest::html_form_set
and then submit it. After looking at the form, I realized there is no submit button. The button available on the website is an anchor tag with a href to a script. I tried using rvest::session_follow_link()
but am unable to get the data. This is the code that doesn't work:
trademark_search_page <- rvest::session('https://ipindiaonline.gov.in/tmrpublicsearch/frmmain.aspx')
search_form <- rvest::html_form(trademark_search_page)[[1]]
search_form <- search_form %>% rvest::html_form_set(`ctl00$ContentPlaceHolder1$TBWordmark` = 'Bull',
`ctl00$ContentPlaceHolder1$TBClass` = 32)
resp <- trademark_search_page %>% rvest::session_submit(search_form) %>%
rvest::session_follow_link(xpath = '//a[@id = "ContentPlaceHolder1_BtnSearch"]')
Any suggestions on what I should be doing?
I think it might be tricky to do with rvest
because the button references a javascript script. If you're open to other tools, here's how to do it with RSelenium
# load libraries
library(RSelenium)
# define url ---------------------------------------------------------
url <- "https://ipindiaonline.gov.in/tmrpublicsearch/frmmain.aspx"
# define search terms ------------------------------
word_mark <- "Bull"
class_search_term <- "32"
# start RSelenium ------------------------------------------------------------
rD <- rsDriver(browser="firefox", port=4548L, chromever = NULL)
remDr <- rD[["client"]]
# Navigate to webpage -----------------------------------------------------
remDr$navigate(url)
# fill in the form ------------------------------------------------
# this finds the html element for each part of the form
# and fills it in with the value we want
# Wordmark
remDr$findElement(using = "id", value = "ContentPlaceHolder1_TBWordmark")$sendKeysToElement(list(word_mark))
# Class
remDr$findElement(using = "id", value = "ContentPlaceHolder1_TBClass")$sendKeysToElement(list(class_search_term))
# click submit button ---------------------------------------
remDr$findElements("id", "ContentPlaceHolder1_BtnSearch")[[1]]$clickElement()
Here's what the page that leads to looks like:
After you get to this page you can get list of the more details links using rvest
library(rvest)
library(magrittr)
# pull html from page
html <- remDr$getPageSource()[[1]]
# find all the html elements with the .LnkshowDetails class
more_details_butons <- html %>% read_html() %>%
html_nodes(".LnkshowDetails") %>%
html_attr("id")
then you could loop though all the buttons and click on them or pull data