I've code for Proxy IP Rotation and user agent spoofing in order to use in scraping. But because of code was provided as an example, I don't know if it really works when I add it to my code.
I am a beginner in Python. I just add it to my .py file (after the codes that is for scraping). When I add it and start scraping it works and gets all the data but I don't know if it is working or not.
Proxy Rotation:
from lxml.html import fromstring
import requests
from itertools import cycle
import traceback
proxies = ['121.129.127.209:80', '124.41.215.238:45169', '185.93.3.123:8080', '194.182.64.67:3128', '106.0.38.174:8080', '163.172.175.210:3128', '13.92.196.150:8080']
proxies = get_proxies()
proxy_pool = cycle(proxies)
url = 'https://httpbin.org/ip'
for i in range(1,11):
proxy = next(proxy_pool)
print("Request #%d"%i)
try:
response = requests.get(url,proxies={"http": proxy, "https": proxy})
print(response.json())
except:
print("Skipping. Connnection error")
User Agent Spoofing:
import requests
import random
user_agent_list = [
#Chrome
'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
#Firefox
'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)',
'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)',
'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)'
]
url = 'https://httpbin.org/user-agent'
#Lets make 5 requests and see what user agents are used
#Using Requests
for i in range(1,6):
#Pick a random user agent
user_agent = random.choice(user_agent_list)
#Set the headers
headers = {'User-Agent': user_agent}
#Make the request
response = requests.get(url,headers=headers)
print("Request #%d\nUser-Agent Sent:%s\nUser Agent Recevied by HTTPBin:"%(i,user_agent))
print(response.content)
print("-------------------\n\n")
If you wanted to check if your proxy and user agent are rotating, you need to go to a request bin website, activate an endpoint and use that endpoint within your python code in place of what was previously requested.
You would then examine the request bin and read what is stated for user-agent and Ip address for the Get requests now listed after executing your python code.