python python-3.x web-scraping python-requests tor

Unable to connect to Tor using requests whereas I did the same using selenium

I've written two scripts in python: one using selenium and the other using requests to connect to http://check.torproject.org using Tor and get this piece of text Congratulations. This browser is configured to use Tor from there in order to be sure I'm doing things in the right way.

When I use the below script I cen get the text smoothly:

from selenium import webdriver
import os

torexe = os.popen(r"C:\Users\WCS\Desktop\Tor Browser\Browser\TorBrowser\Tor\tor.exe")

options = webdriver.ChromeOptions()
options.add_argument('--proxy-server=socks5://localhost:9050')
driver = webdriver.Chrome(chrome_options=options)

driver.get("http://check.torproject.org")
item = driver.find_element_by_css_selector("h1.not").text
print(item)

driver.quit()

However, when I try to do the same using requests, I get an error AttributeError: 'NoneType' object has no attribute 'text':

import requests
from bs4 import BeautifulSoup
import os

torexe = os.popen(r"C:\Users\WCS\Desktop\Tor Browser\Browser\TorBrowser\Tor\tor.exe")

with requests.Session() as s:
    s.proxies['http'] = 'socks5://localhost:9050'
    res = s.get("http://check.torproject.org")
    soup = BeautifulSoup(res.text,"lxml")
    item = soup.select_one("h1.not").text
    print(item)

How can I get the same text using requests from that site?

When I use this print(soup.title.text), I can get this text Sorry. You are not using Tor. which clearly indicates that the requests is not made via Tor.

Solution

check.torproject.org forces HTTPS so when requests follows the redirect to https://check.torproject.org you are no longer using the SOCKS proxy since it was only specified for the http protocol.

Make sure to set the proxy for both HTTP and HTTPS. Also, to resolve DNS names over Tor and not leak DNS requests, use socks5h.

s.proxies['http']  = 'socks5h://localhost:9050'
s.proxies['https'] = 'socks5h://localhost:9050'

This should result in your test working properly.