I've written two scripts in python: one using selenium
and the other using requests
to connect to http://check.torproject.org using Tor and get this piece of text Congratulations. This browser is configured to use Tor from there in order to be sure I'm doing things in the right way.
When I use the below script I cen get the text smoothly:
from selenium import webdriver
import os
torexe = os.popen(r"C:\Users\WCS\Desktop\Tor Browser\Browser\TorBrowser\Tor\tor.exe")
options = webdriver.ChromeOptions()
options.add_argument('--proxy-server=socks5://localhost:9050')
driver = webdriver.Chrome(chrome_options=options)
driver.get("http://check.torproject.org")
item = driver.find_element_by_css_selector("h1.not").text
print(item)
driver.quit()
However, when I try to do the same using requests
, I get an error AttributeError: 'NoneType' object has no attribute 'text'
:
import requests
from bs4 import BeautifulSoup
import os
torexe = os.popen(r"C:\Users\WCS\Desktop\Tor Browser\Browser\TorBrowser\Tor\tor.exe")
with requests.Session() as s:
s.proxies['http'] = 'socks5://localhost:9050'
res = s.get("http://check.torproject.org")
soup = BeautifulSoup(res.text,"lxml")
item = soup.select_one("h1.not").text
print(item)
How can I get the same text using requests
from that site?
When I use this print(soup.title.text)
, I can get this text Sorry. You are not using Tor.
which clearly indicates that the requests
is not made via Tor
.
check.torproject.org forces HTTPS so when requests follows the redirect to https://check.torproject.org
you are no longer using the SOCKS proxy since it was only specified for the http
protocol.
Make sure to set the proxy for both HTTP and HTTPS. Also, to resolve DNS names over Tor and not leak DNS requests, use socks5h
.
s.proxies['http'] = 'socks5h://localhost:9050'
s.proxies['https'] = 'socks5h://localhost:9050'
This should result in your test working properly.