For two days, I have been attempting to puzzle out how to fill out a form and submit it to download .csv files from https://www.igb.illinois.gov/VideoReports.aspx. I cannot seems to crack it, unfortunately. Full disclosure: I am a novice web scraper. I can do basic scraping, but this is new territory for me. I'm hoping in the end to write a program that will pull down all establishments' monthly revenue reports back to September 2009.
It seems the main issue has to do with how the form is laid out. I can't seem to figure out how to designate the fields I'd like to fill in to request the .csv file. I have been using rvest
and RHTMLForms
. I've located the form in the chrome dev tools and can see everything I need. I just can't seem to drill down to where I need to go to submit the query.
Here's where I've gotten so far:
library('rvest')
library('RHTMLForms')
igb <- "https://www.igb.illinois.gov/VideoReports.aspx"
igb_html <- read_html(igb)
igbForm <- html_form(igb_html)
igbForm
The issue seems to start here. The "form" has only one element, and it includes hidden inputs. The fields I want to query are toward the end. It looks like this ...
[[1]]
<form> 'aspnetForm' (POST VideoReports.aspx)
<input hidden> '__VIEWSTATE': /wEPDwUKMTU1MTExNzA3NQ9kFgJmD2QWAgIDD2QWAgIBD2QWBAIBD2QWEgIDDw8WAh4EVGV4dAUOU2VwdGVtYmVyIDIwMTJkZAIFDw8WAh8ABQ1GZWJydWFyeSAyMDIwZGQCFQ9kFgICAw8QZBAVAg5TdW1tYXJ5IHJlcG9ydA1EZXRhaWwgcmVwb3J0FQIOU3VtbWFyeSByZXBvcnQNRGV0YWlsIHJlcG9ydBQrAwJnZ2RkAhcPZBYCAgMPEA8WBh4ORGF0YVZhbHVlRmllbGQFA0tleR4NRGF0YVRleHRGaWVsZAUFVmFsdWUeC18 ....[TRUNCATE]
and at the very end, I get to what I'd like to query ...
<input radio> 'ctl00$MainPlaceHolder$SearchType': TypeStatewide
<input radio> 'ctl00$MainPlaceHolder$SearchType': TypeMuni
<input radio> 'ctl00$MainPlaceHolder$SearchType': TypeEst
<select> 'ctl00$MainPlaceHolder$SearchStateType' [1/2]
<select> 'ctl00$MainPlaceHolder$SearchMunicipality' [0/1069]
<select> 'ctl00$MainPlaceHolder$SearchEstablishment' [0/10182]
<input text> 'ctl00$MainPlaceHolder$SearchLicenseNumber':
<select> 'ctl00$MainPlaceHolder$SearchStartMonth' [1/12]
<select> 'ctl00$MainPlaceHolder$SearchStartYear' [1/9]
<select> 'ctl00$MainPlaceHolder$SearchEndMonth' [1/12]
<select> 'ctl00$MainPlaceHolder$SearchEndYear' [1/9]
<input radio> 'ctl00$MainPlaceHolder$ViewType': ViewPDF
<input radio> 'ctl00$MainPlaceHolder$ViewType': ViewCSV
I used the following to peak into what I need ...
igb_form <- getHTMLFormDescription(igb_html)
igb_form[[1]]
... and this code to locate the fields and values for each of them. For example ...
igb_form_att <- igb_form[[1]]
igb_form_att$elements[[9]]
... shows me the start month field and values from the dropdown menu ...
ctl00$MainPlaceHolder$SearchStartMonth: [ February ] January, February, March, April, May, June, July, August, September, October, November, December
I thought this would do it. So I ran the following ...
igb_fill <- set_values(igb_html,
'ctl00$MainPlaceHolder$SearchType' = 'TypeEst',
'ctl00$MainPlaceHolder$SearchEstablishment'='All Establishments',
'ctl00$MainPlaceHolder$SearchEstablishment' ='',
'ctl00$MainPlaceHolder$SearchStartMonth'='September',
'ctl00$MainPlaceHolder$SearchStartYear'='2009',
'ctl00$MainPlaceHolder$SearchEndMonth' ='February',
'ctl00$MainPlaceHolder$SearchEndYear'='2020',
'ctl00$MainPlaceHolder$ViewType'='ViewCSV')
submit_form(session=igb_html, form=igb_fill, POST(igb))
But received this error ...
Error: Unknown field names: ctl00$MainPlaceHolder$SearchType, ctl00$MainPlaceHolder$SearchEstablishment, ctl00$MainPlaceHolder$SearchStartMonth, ctl00$MainPlaceHolder$SearchStartYear, ctl00$MainPlaceHolder$SearchEndMonth, ctl00$MainPlaceHolder$SearchEndYear, ctl00$MainPlaceHolder$ViewType
Traceback:
1. set_values(igb_form, `ctl00$MainPlaceHolder$SearchType` = "TypeEst",
. `ctl00$MainPlaceHolder$SearchEstablishment` = "All Establishments",
. `ctl00$MainPlaceHolder$SearchEstablishment` = "", `ctl00$MainPlaceHolder$SearchStartMonth` = "September",
. `ctl00$MainPlaceHolder$SearchStartYear` = "2009", `ctl00$MainPlaceHolder$SearchEndMonth` = "February",
. `ctl00$MainPlaceHolder$SearchEndYear` = "2020", `ctl00$MainPlaceHolder$ViewType` = "ViewCSV")
2. stop("Unknown field names: ", paste(no_match, collapse = ", "),
. call. = FALSE)
Apologies for the long-winded question, but I've poked around a lot on this and can't seem to find an answer that helps me get me to where I need to go. Maybe I'm in over my head. But I'd appreciate any help! (I'm also pretty sure the submit code is wrong, but I can tackle that after this.)
There were some issues with your code:
set_values(...)
function takes a form, not the entire html, so I replaced igb_html
with igb_form
there.submit_form(...)
function takes an html_session
, so I replaced read_html(igb)
with html_session(igb)
.The following code should work:
library(rvest)
igb <- "https://www.igb.illinois.gov/VideoReports.aspx"
igb_html <- html_session(igb)
igb_form <- html_form(igb_html)[[1]]
igb_fill <- set_values(igb_form,
'ctl00$MainPlaceHolder$SearchType' = 'TypeEst',
'ctl00$MainPlaceHolder$SearchEstablishment'='All Establishments',
'ctl00$MainPlaceHolder$SearchEstablishment' ='',
'ctl00$MainPlaceHolder$SearchStartMonth'='September',
'ctl00$MainPlaceHolder$SearchStartYear'='2009',
'ctl00$MainPlaceHolder$SearchEndMonth' ='February',
'ctl00$MainPlaceHolder$SearchEndYear'='2020',
'ctl00$MainPlaceHolder$ViewType'='ViewCSV')
igb_html <- submit_form(igb_html, igb_fill, submit = "ctl00$MainPlaceHolder$ButtonSearch")
igb_html