Search code examples
pythonweb-scrapingscrapy

Can't scrap a website that checks for security with scrapy


I want to scrape: https://9anime.gs/filter?keyword=oshi+n+oko So I went into my browser and tried to look into its code but I realized that it shows this page,

Security Checking And after a little time, it shows Redirecting For a very little time the url changes and returns back to normal. Redirecting Then the website loads.. Website

I get this response whenever I send a request to the site: Codes Inline code didn't work

I don't really understand what the script does.

Please give me a solution or give me a guideline that I can follow to scrape this site

I tried fake user-agents but it didn't work. I also tried a little with cookies but I couldn't find which cookies I need to send.


Solution

  • As Roberts has mentioned as well, the website has a captcha to protect itself against unwanted access. Though, it is possible to access the website through selenium, using browser automation.

    from selenium import webdriver
    from selenium.webdriver.chrome.service import Service
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as ec
    from selenium.webdriver.support.ui import WebDriverWait
    
    from webdriver_manager.chrome import ChromeDriverManager
    
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
    url = 'https://9anime.se/filter?keyword=oshi+n+oko'
    
    driver.get(url)
    WebDriverWait(driver, 60).until(ec.visibility_of_element_located((By.CSS_SELECTOR, 'section[class="block_area block_area-anime none-bg"]'))) # element for search results
    

    I am able to access the website and locate the search results section with this code.

    Note: I used the .se domain, as .gs had stopped working for me, but all of the 9anime domains have the same structure.