Search code examples
pythonweb-scrapingbeautifulsoupcsrf

Posting CSRF Token with multiple tokens?


I have been attempting to scrape a website with a login (using yelp). First question for better understanding: I followed a few tutorials to get the ideas, and noticed they all make dictionaries with the CSRF tokens, however, when I scrape the yelp login site, I find 6 tokens. I know that I can't have a duplicate key in a dictionary, so is the tutorial's using dictionaries for this redundant/incorrect since I will only end up with the last token?

Secondly, if there are multiple tokens, which do you use? Or how do you use all of them? I can't seem to get the login to work and have read the documentation for BeautifulSoup and Requests, and scoured Stack the last night. Code below. Thanks for any explanations.

s = requests.session()
login = s.get('https://www.yelp.com/login')

soup = BeautifulSoup(login.text, 'html.parser')
tokenList = soup.find_all(type = 'hidden', attrs={"name": "csrftok"})
c = login.cookies  #Just peeked into cookies to see if there is a token 
print(c)

keys = [x.attrs["name"] for x in tokenList]
values = [x.attrs["value"] for x in tokenList]
#If I print these two lists, I get 6 keys of the "csrftok" String, and 6 
#different keys.  

email = "my email"
password = "my password"
#I tried creating a dictionary with zip of all the tokens, etc. This 
#is an attempt just using the first key and value I find.
d = {'email': email, 'password': password, keys[0]: values[0]}
response = s.post('https://www.yelp.com/login', data = d)

print(response.url)

Solution

  • Have you tried like this? I think it should lead you to the right direction:

    s = requests.session()
    login = s.get('https://www.yelp.com/login')
    
    soup = BeautifulSoup(login.text, 'lxml')
    token = soup.select(".csrftok")[0]['value']
    
    email = "my email"
    password = "my password"
    
    headers={
    'accept':'application/json, text/javascript, */*; q=0.01',
    'accept-encoding':'gzip, deflate, br',
    'content-type':'application/x-www-form-urlencoded; charset=UTF-8',
    'referer':'https://www.yelp.com/login',
    'user-agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36',
    'x-distil-ajax':'fytrybseesxsvsresb',
    'x-requested-with':'XMLHttpRequest'
    }
    
    payload = {
    'csrftok':token,
    'email':email,
    'password':password,
    }
    
    response = s.post('https://www.yelp.com/login/newajax', data = payload, headers=headers)
    print(response.url)