Search code examples
rscreen-scrapingrvestrselenium

Could you please help me with web scraping using Rvest?


I am currently trying to webscrape the following website: https://chicago.suntimes.com/crime/archives

I have been relying on the CSS Selector Gadget to find the x-path and to do web scraping. However, I am unable to use the gadget in this website and I would have to use the Inspect Source to find what I need. I have been trying to find the relevant css and xpath by scrolling down each source, but I was not able to do it due to my limited capabilities.

Could you please help me find the xpath or css for

  • Title
  • Author
  • Date

I am so sorry if this is a dry laundry list of everything... but I am really stuck. I will really appreciate if you could give me some help!

Thank you very much.


Solution

  • For each element that you want to extract if you find the relevant tag with it's respective class using selector gadget you'll be able to get what you want.

    library(rvest)
    url <- 'https://chicago.suntimes.com/crime/archives'
    
    webpage <- url %>% read_html() 
    title <- webpage %>% html_nodes('h2.c-entry-box--compact__title') %>% html_text()
    author <- webpage %>% html_nodes('span.c-byline__author-name') %>% html_text()
    date <- webpage %>% html_nodes('time.c-byline__item')%>% html_text() %>% trimws()
    result <- data.frame(title, author, date)
    result
    
    result
    #                                                                                               title              author        date
    #1                               Belmont Cragin man charged with carjacking in Little Village: police       Sun-Times Wire February 17
    #2                                                   Gas station robbed, man carjacked in Horner Park       Jermaine Nolen February 17
    #3                                                              8 shot, 2 fatally, Tuesday in Chicago       Sun-Times Wire February 17
    #4                                        Businesses robbed at gunpoint on the Northwest Side: police       Sun-Times Wire February 17
    #5                                                              Man charged with carjacking in Aurora       Sun-Times Wire February 16
    #6                                                       Woman fatally stabbed in Park Manor apartment      Sun-Times Wire February 16
    #7                                                        Woman critically hurt by gunfire in Woodlawn       David Struett February 16
    #8                                Teen boy, 17, charged with attempted carjacking in Back of the Yards      Sun-Times Wire February 16
    #...
    #...