Search code examples
rweb-scrapinghyperlinkrvest

how to scrape all links from a page in r


I want to scrape all the division links from a website, but I keep getting NAs. Any ideas on a fix?

library(rvest)
library(tidyverse)

pageMen = read_html('https://www.bjjcompsystem.com/tournaments/1869/categories')

get_links <- pageMen %>% html_nodes('.panel-default') %>% html_attr('href')
get_links

By adjusting the above, I managed to scrape one link, but cannot find where all the other links are contained when I inspect the elements

get_links <- pageMen %>% html_elements(xpath = '/html/body/div[3]/div/div/div/ul/li[1]/a') %>% html_attr('href') %>% paste0('https://www.bjjcompsystem.com',.) 
get_links

Solution

  • You could do

    library(rvest)
    library(tidyverse)
    
    pageMen = read_html('https://www.bjjcompsystem.com/tournaments/1869/categories')
    
    get_links <- pageMen %>% 
      html_nodes('.categories-grid__category a') %>% 
      html_attr('href') %>%
      paste0('https://www.bjjcompsystem.com', .)
    
    get_links[1:5]
    #> [1] "https://www.bjjcompsystem.com/tournaments/1869/categories/2053146"
    #> [2] "https://www.bjjcompsystem.com/tournaments/1869/categories/2053150"
    #> [3] "https://www.bjjcompsystem.com/tournaments/1869/categories/2053154"
    #> [4] "https://www.bjjcompsystem.com/tournaments/1869/categories/2053158"
    #> [5] "https://www.bjjcompsystem.com/tournaments/1869/categories/2053162"