Novice to web-scrapping here... Similar questions have been posted (and answered) but I can't seem to successfully apply I'm trying loop over a data set and get some scores (The percentile of atherosclerosis) using the following online calculator: https://www.mesa-nhlbi.org/Calcium/input.aspx (for example plugging in Score = 30, gender=0(female),Race =3(white),Age=50 will get you 94%)
However I cant seem to get any results matching the manual execution of the calculator - this is my code: (+Thanks in advance!!)
#if(!require("devtools"))
#install.packages("devtools")
#devtools::install_github("omegahat/RHTMLForms")
#install.packages("XML")
library(XML)
library(RCurl)
library(httr)
library(tidyverse)
library(RHTMLForms)
(https://stackoverflow.com)library(rvest)
cur_url <- "https://www.mesa-nhlbi.org/Calcium/input.aspx/"
cur_session <- html_session(cur_url)
cur_Form <- html_form(cur_session)
cur_fill <- set_values(cur_Form[[1]],
Score = '30',
gender='0',
Race ='3',
Age='50')
cur_set <- submit_form(cur_session, cur_fill,submit = "Calculate")
content(cur_set$response)
Using the rvest library I've read the url into a "html_session" variable and extracted the form via "html_form"
cur_url <- "https://www.mesa-nhlbi.org/Calcium/input.aspx/"
cur_session <- html_session(cur_url)
cur_Form <- html_form(cur_session)
updated the relevant fields using the set_values function and then used submit_form to execute -
cur_fill <- set_values(cur_Form[[1]], Score = '30',gender='0',Race ='3',Age='50')
cur_set <- submit_form(cur_session, cur_fill,submit = "Calculate")
however I don't seem to get any relevant results in the cur_set veriable Any help on the matter will be greatly appreciated..
If you look at the web page in a browser (e.g., firefox, chrome) and enable the dev-console, you can see certain id
fields and such that will help identify what you need.
Up front, rvest
(1.0.3 in my usage) has deprecated several functions you are using. I believe it'll work for now as-is, but I'm using the recommended functions:
session()
in lieu of html_session()
html_form_set()
in lieu of set_values()
session_submit()
in lieu of submit_form()
library(rvest)
cur_url <- "https://www.mesa-nhlbi.org/Calcium/input.aspx/"
cur_session <- session(cur_url)
cur_Form <- html_form(cur_session)
cur_fill <- html_form_set(cur_Form[[1]], Score = '30',gender='0',Race ='3',Age='50')
cur_set <- session_submit(cur_session, cur_fill,submit = "Calculate")
Various things you can get from this:
html_table(cur_set)
# [[1]]
# # A tibble: 2 × 4
# X1 X2 X3 X4
# <chr> <chr> <chr> <chr>
# 1 25th 50th 75th 90th
# 2 0 0 0 8
From the dev-browser, we find specific areas, notably scoreLabel
(30) and others:
Similarly for percLabel
(90) and Label10
("16 %."
).
From this,
html_nodes(cur_set, "#Label10") %>%
html_text()
# [1] "16 %."
html_nodes(cur_set, "#scoreLabel") %>%
html_text()
# [1] "30"
html_nodes(cur_set, "#percLabel") %>%
html_text()
# [1] "94"