Search code examples
pythonweb-scraping

How to scrape this using bs4


I have to get <a class="last" aria-label="Last Page" href="https://webtoon-tr.com/webtoon/page/122/">Son »</a>. From this site:https://webtoon-tr.com/webtoon/

But when i try to scrape it with this code:

from bs4 import BeautifulSoup
import requests

url = "https://webtoon-tr.com/webtoon/"
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")

last = soup.find_all("a",{"class":"last"})
print(last)

It just returns me an empty list, and when i try to scrape all "a" tags it only returns 2 which are completly different things.

Can somebody help me about it ? I really appreciate it.


Solution

  • Website is protected by Cloudflare. requests, cloudscraper or request_html doesn't work for me, only selenium:

    from selenium import webdriver
    from selenium.webdriver.chrome.service import Service
    from selenium.webdriver.chrome.options import Options
    from bs4 import BeautifulSoup
    
    
    chrome_options = Options()
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--headless")
    
    webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
    browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
    
    browser.get("https://webtoon-tr.com/webtoon/")
    soup = BeautifulSoup(browser.page_source, 'html5lib')
    browser.quit()
    link = soup.select_one('a.last')
    print(link)
    

    This returns

    <a aria-label="Last Page" class="last" href="https://webtoon-tr.com/webtoon/page/122/">Son »</a>