I am currently trying to webscrape the following website: https://chicago.suntimes.com/crime/archives
I have been relying on the CSS Selector Gadget to find the x-path and to do web scraping. However, I am unable to use the gadget in this website and I would have to use the Inspect Source to find what I need. I have been trying to find the relevant css and xpath by scrolling down each source, but I was not able to do it due to my limited capabilities.
Could you please help me find the xpath or css for
I am so sorry if this is a dry laundry list of everything... but I am really stuck. I will really appreciate if you could give me some help!
Thank you very much.
For each element that you want to extract if you find the relevant tag with it's respective class using selector gadget you'll be able to get what you want.
library(rvest)
url <- 'https://chicago.suntimes.com/crime/archives'
webpage <- url %>% read_html()
title <- webpage %>% html_nodes('h2.c-entry-box--compact__title') %>% html_text()
author <- webpage %>% html_nodes('span.c-byline__author-name') %>% html_text()
date <- webpage %>% html_nodes('time.c-byline__item')%>% html_text() %>% trimws()
result <- data.frame(title, author, date)
result
result
# title author date
#1 Belmont Cragin man charged with carjacking in Little Village: police Sun-Times Wire February 17
#2 Gas station robbed, man carjacked in Horner Park Jermaine Nolen February 17
#3 8 shot, 2 fatally, Tuesday in Chicago Sun-Times Wire February 17
#4 Businesses robbed at gunpoint on the Northwest Side: police Sun-Times Wire February 17
#5 Man charged with carjacking in Aurora Sun-Times Wire February 16
#6 Woman fatally stabbed in Park Manor apartment Sun-Times Wire February 16
#7 Woman critically hurt by gunfire in Woodlawn David Struett February 16
#8 Teen boy, 17, charged with attempted carjacking in Back of the Yards Sun-Times Wire February 16
#...
#...