Search code examples
pythonpython-3.xselenium-webdriverselenium-chromedrivergoogle-chrome-headless

Selenium is not scraping in the right language


I am scraping details from this website: https://buff.163.com. If I run the code with graphic mode enabled, it scrapes in English as I want. However, when I disable the graphic mode, it starts to scrape in Chinese.

This is my code:

class WebScrapping:
def __init__(self,url,graphic_mode):
    self.scraped_list = []
    self.scraped_names = []
    self.options = Options()
    self.options.add_argument("--lang=ENG")

    if graphic_mode.lower() == "enable" or graphic_mode.lower() == "on":
        self.url = f"{url}"
        self.driver = webdriver.Chrome(options=self.options)
        self.driver.get(self.url)
        self.driver.execute_script(f"document.title = '{url}';")
    elif graphic_mode.lower() == "disable" or graphic_mode.lower() == "off":
        self.options.add_argument("--headless")
        self.options.add_argument("--lang=ENG")
        self.url = f"{url}"
        self.driver = webdriver.Chrome(options=self.options)
        self.driver.get(self.url)
        self.driver.execute_script(f"document.title = '{url}';")
    else:
        print("Please use on or off, to enable or disable the graphic mode")

Solution

  • You can try changing the language locale code to force English:

    from selenium import webdriver
    from selenium.webdriver.chrome.service import Service
    
    service = Service()
    options = webdriver.ChromeOptions()
    options.add_argument("--headless=new")
    prefs = {}
    prefs["intl.accept_languages"] = "en_us"
    options.add_experimental_option("prefs", prefs)
    driver = webdriver.Chrome(service=service, options=options)
    # ...
    driver.quit()
    

    For a full list of locale codes to use with Selenium, see: https://seleniumbase.github.io/help_docs/locale_codes/

    Note that options.add_argument("--headless=new") uses the newer Chrome headless mode, which works exactly like regular Chrome.