I am trying to scrape players information from website using the following code:
#install required packages
if(!require(pacman))install.packages("pacman")
pacman::p_load('rvest', 'stringi', 'dplyr', 'tidyr', 'measurements', 'reshape2','foreach','doParallel','raster','curl','httr','Iso')
profile_detail<-read_html('https://www.pgatour.com/players/player.01006.john-adams.html#profile')%>%html_node("[class='s-header__bottom']")%>%html_children()
But this code is not giving me the desired result. Instead, getting one one node:
[1] <div class="s-header__no-data">No additional profile information available</div>
Not sure how to access the div class of 's-col'
Here is the snippet of the players info I want to extract:
Can anyone help me with this please?
Thanks in advance!
You could use div.s-col
in html_nodes
:
library(rvest)
url <- 'https://www.pgatour.com/players/player.06197.michael-allen.html'
url %>%
read_html() %>%
html_nodes('div.s-col') %>%
html_text() %>%
gsub('\\h+', ' ', ., perl = TRUE) %>%
cat
I am not sure how you want your final expected output to look but this returns :
#Michael Allen
#Full Name
#6 ft, 0 in
#183 cm
#Height
#195 lbs
#89 kg
#Weight
#January 31, 1959
#Birthday
#61
#AGE
#San Mateo, California
#Birthplace
#Scottsdale, Arizona
#Residence
#Wife, Cynthia; Christy (12/8/93), Michelle (6/3/97)
#Family
#University of Nevada (1982, Horticulture)
#College
#1984
#Turned Pro
#16,963,593
#Career Earnings
#Paradise Valley, AZ, United States
#City Plays From
Note that some of the players don't have their personal information on the page.