So I'm trying to navigate to this url:
and scrape data from the div with the class item-name item-row
. There are two main problems though, the first is that requires a login before you can get to that url, and the second is that most of the page is generated with javascript.
I believe I've solved the first problem because my
gets a 200 response code. I'm also pretty sure that r.html.render()
is supposed to solve the second problem by rendering the javascript generated html before I scrape it. Unfortunately, the last line in my code is only returning an empty list, despite the fact that selenium had no problem getting this element. Does anyone know why this isn't workng?
from requests_html import HTMLSession
from bs4 import BeautifulSoup
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'}
session = HTMLSession()
res1 = session.get('', headers=headers)
soup = BeautifulSoup(res1.content, 'html.parser')
token = soup.find('meta', {'name': 'csrf-token'}).get('content')
data = {"user": {"email": "", "password": "password"},
"authenticity_token": token}
response ='', headers=headers, data=data)
r = session.get("", headers=headers)
print(r.html.xpath("//div[@class='item-name item-row']"))
After logging in using requests module and BeautifulSoup, you can make use of the link I've already suggested in the comment to parse the required data available within json. The following script should get you name, quantity, price and a link to the concerning product. You can only get 21 product using the script below. There is an option for pagination within this json content. You can get all of the products by playing around with that pagination.
import json
import requests
from bs4 import BeautifulSoup
baseurl = ''
data_url = ""
data = {"user": {"email": "", "password": "password"},
"authenticity_token": ""}
headers = {
'x-requested-with': 'XMLHttpRequest'
with requests.Session() as s:
res = s.get('',headers={'user-agent':'Mozilla/5.0'})
soup = BeautifulSoup(res.text, 'lxml')
token = soup.select_one("[name='csrf-token']").get('content')
data["authenticity_token"] = token"",json=data,headers=headers)
resp = s.get(data_url, headers=headers)
for item in resp.json()['module_data']['items']:
name = item['name']
quantity = item['size']
price = item['pricing']['price']
product_page = baseurl + item['click_action']['data']['container']['path']
Partial output:
SB Whole Milk
1 gal
At $0.69/lb
Yellow Onion
At $1.14/lb