I'm trying to get products of https://www.salewa.com/de-de/herren using the code below. the problem is that when the next_page
goes to /de-de/herren?p=4
it does not yield any items. on the browser it's infinite scrolling and it scrolls all the way to p=9
. Thus my code yields only 108 instead of 295 items.
before i thought the problem was empty pages, so i wanted to skip it by if len(products) > 0:
but now it stops on page 3 and gets no more products.
import scrapy
from scrapy.selector import Selector
import re
import json
from scrapy import Spider, Request
from datetime import datetime as dt
import csv
class Salewa_Spider(Spider):
name = "salewa"
allowed_domains = ["salewa.com"]
start_urls = ["https://www.salewa.com/de-de/herren"]
def parse(self, response):
products = response.css('div.product--info')
for product in products:
yield{
'name' : product.css('h2.product--title::text').get().strip(),
'price': product.css('span.price--default::text').get().strip(),
'url' : product.css('a.product--information-box').attrib['href'],
}
if len(products) > 0:
try:
next_page = response.css('a[class^="listing-page--nav page--next"]').attrib['href']
except:
next_page = []
if next_page is not None:
next_page_url = 'https://www.salewa.com' + next_page
yield response.follow(next_page_url, callback=self.parse)
This is because the infinite scroll is getting the information from ajax calls to a different url in order to fill the product information.
The url for the intermediate pages can be found by looking in the network tab of the browsers dev tools. You need to discover what that url is and replicate it in your scrapy requests in order to get the rest of the items from the infinite scroll.
For this site specifically the api url is "https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part={page number}&o=1&n=36&loadProducts=1" which returns a json object which holds all the html elements for that page.
What you can do is send individual requests for each of these pages, extract the html from the json object, cast it to a scrapy selector, and then you would be able to parse the information just like you did for the first page. Using this strategy I was able to yield 296 unique results
For example:
from scrapy.selector import Selector
from scrapy import Spider, Request
class Salewa_Spider(Spider):
name = "salewa"
allowed_domains = ["salewa.com"]
def start_requests(self):
yield Request("https://www.salewa.com/de-de/herren") # request for the first page
for i in range(2, 10):
# request for remaining pages
url = "https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=" + str(i) + "&o=1&n=36&loadProducts=1"
yield Request(url)
def parse(self, response):
try:
# if parsing the first page this will fail otherwise this part is needed
html = Selector(text="<html>" + response.json()['listing'] + "</html>")
response = html
except:
pass
products = response.css('div.product--info')
for product in products:
yield{
'name' : product.css('h2.product--title::text').get().strip(),
'price': product.css('span.price--default::text').get().strip(),
'url' : product.css('a.product--information-box').attrib['href'],
}
output
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=2&o=1&n=36&loadProducts=1>
{'name': 'PEDROC MERINO KURZE SOCKEN HERREN', 'price': '22,00\xa0€', 'url': 'https://www.salewa.com/de-de/pedroc-merino-kurze-socken-herren-00-0000069055?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=2&o=1&n=36&loadProducts=1>
{'name': 'Lavaredo Hemp Ripstop Hose Herren', 'price': '104,00\xa0€', 'url': 'https://www.salewa.com/de-de/lavaredo-hemp-ripstop-hose-herren--00-0000028550?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=2&o=1&n=36&loadProducts=1>
{'name': "Pedroc Dry'Ton Mesh T-Shirt Herren", 'price': '60,00\xa0€', 'url': 'https://www.salewa.com/de-de/pedroc-dryton-mesh-t-shirt-herren-00-0000028584?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=2&o=1&n=36&loadProducts=1>
{'name': 'Lavaredo Hemp Pullover Herren', 'price': '100,00\xa0€', 'url': 'https://www.salewa.com/de-de/lavaredo-hemp-pullover-herren--00-0000028547?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=2&o=1&n=36&loadProducts=1>
{'name': 'Fanes 3 Layers Powertex Hemp 2 in 1 Parka Herren', 'price': '700,00\xa0€', 'url': 'https://www.salewa.com/de-de/fanes-3-layers-powertex-hemp-2-in-1-parka-herren-00-0000028666?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=2&o=1&n=36&loadProducts=1>
{'name': 'Alp Trainer 2 Schuh Herren', 'price': '170,00\xa0€', 'url': 'https://www.salewa.com/de-de/alp-trainer-2-schuh-herren-00-0000061402?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=2&o=1&n=36&loadProducts=1>
{'name': 'Fanes Engineered Merino Logo Pullover Herren', 'price': '112,00\xa0€', 'url': 'https://www.salewa.com/de-de/fanes-engineered-merino-logo-pullover-herren-00-0000028355?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=2&o=1&n=36&loadProducts=1>
{'name': 'Ortles RDS Hybrid Daunenjacke Herren', 'price': '340,00\xa0€', 'url': 'https://www.salewa.com/de-de/ortles-rds-hybrid-daunenjacke-herren-00-0000028458?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Wildfire 2 Gore-Tex® Schuh Herren', 'price': '190,00\xa0€', 'url': 'https://www.salewa.com/de-de/wildfire-2-gore-tex-schuh-herren-00-0000061414?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Puez Dolomitic 2 Durastretch Regular Hose Herren', 'price': '100,00\xa0€', 'url': 'https://www.salewa.com/de-de/puez-dolomitic-2-durastretch-regular-hose-herren-00-0000028484?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Puez Polarlite Fleece Herren', 'price': '100,00\xa0€', 'url': 'https://www.salewa.com/de-de/puez-polarlite-fleece-herren-00-0000028478?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Puez Dolomitic 2 Durastretch Kurze Hose Herren', 'price': '100,00\xa0€', 'url': 'https://www.salewa.com/de-de/puez-dolomitic-2-durastretch-kurze-hose-herren-00-0000028486?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Puez Dolomitic 2 Durastretch Lange Hose Herren', 'price': '100,00\xa0€', 'url': 'https://www.salewa.com/de-de/puez-dolomitic-2-durastretch-lange-hose-herren-00-0000028485?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Puez Polarlite Half Zip Fleece Herren', 'price': '80,00\xa0€', 'url': 'https://www.salewa.com/de-de/puez-polarlite-half-zip-fleece-herren-00-0000028481?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Tognazza Polarlite Herren Jacke', 'price': '84,00\xa0€', 'url': 'https://www.salewa.com/de-de/tognazza-polarlite-herren-jacke-00-0000027918?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Woolen 2 Layers Kapuzenjacke Herren', 'price': '220,00\xa0€', 'url': 'https://www.salewa.com/de-de/woolen-2-layers-kapuzenjacke-herren-00-0000027331?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Fanes Sarner Down Hybrid Weste Herren', 'price': '250,00\xa0€', 'url': 'https://www.salewa.com/de-de/fanes-sarner-down-hybrid-weste-herren-00-0000028017?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Lagazuoi 3 Daunen Herren Jacke', 'price': '220,00\xa0€', 'url': 'https://www.salewa.com/de-de/lagazuoi-3-daunen-herren-jacke-00-0000026705?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Sarner Wolle Hoody Herren', 'price': '270,00\xa0€', 'url': 'https://www.salewa.com/de-de/sarner-wolle-hoody-herren--00-0000026162?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Solidlogo Dri-Release® T-Shirt Herren', 'price': '32,00\xa0€', 'url': 'https://www.salewa.com/de-de/solidlogo-dri-release-t-shirt-herren-00-0000027018?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Paganella Polarlite Herren Jacke', 'price': '63,00\xa0€', 'url': 'https://www.salewa.com/de-de/paganella-polarlite-herren-jacke-00-0000027924?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Wildfire Leather Gore-Tex® Schuh Herren', 'price': '180,00\xa0€', 'url': 'https://www.salewa.com/de-de/wildfire-leather-gore-tex-schuh-herren-00-0000061416?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Brenta RDS Daunenjacke Herren', 'price': '192,00\xa0€', 'url': 'https://www.salewa.com/de-de/brenta-rds-daunenjacke-herren-00-0000027883?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Zebru Responsive Langarm Herren T-Shirt', 'price': '90,00\xa0€', 'url': 'https://www.salewa.com/de-de/zebru-responsive-langarm-herren-t-shirt-00-0000027957?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Solidlogo Dry’Ton Langarm Shirt Herren', 'price': '50,00\xa0€', 'url': 'https://www.salewa.com/de-de/solidlogo-dryton-langarm-shirt-herren-00-0000027340?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Pedroc 3 Durastretch Hose Herren', 'price': '70,00\xa0€', 'url': 'https://www.salewa.com/de-de/pedroc-3-durastretch-hose-herren--00-0000026955?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=3&o=1&n=36&loadProducts=1>
{'name': 'Pure Merino Wollstirnband', 'price': '30,00\xa0€', 'url': 'https://www.salewa.com/de-de/pure-merino-wollstirnband-00-0000028769?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=9&o=1&n=36&loadProducts=1>
{'name': 'Sesvenna Gore® Windstopper® Grip Handschuhe', 'price': '70,00\xa0€', 'url': 'https://www.salewa.com/de-de/sesvenna-gore-windstopper-grip-handschuhe-00-0000026577?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=9&o=1&n=36&loadProducts=1>
{'name': 'Rainbow Gürtel', 'price': '21,00\xa0€', 'url': 'https://www.salewa.com/de-de/rainbow-guertel-00-0000024812?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=9&o=1&n=36&loadProducts=1>
{'name': 'Hiking Gamaschen Größe M', 'price': '45,00\xa0€', 'url': 'https://www.salewa.com/de-de/hiking-gamaschen-groee-m-00-0000002117?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=9&o=1&n=36&loadProducts=1>
{'name': 'Hiking Gamaschen Größe L', 'price': '45,00\xa0€', 'url': 'https://www.salewa.com/de-de/hiking-gamaschen-groee-l-00-0000002116?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=9&o=1&n=36&loadProducts=1>
{'name': 'Approach Gamaschen', 'price': '55,00\xa0€', 'url': 'https://www.salewa.com/de-de/approach-gamaschen-00-0000002115?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=9&o=1&n=36&loadProducts=1>
{'name': 'Trekking Gamaschen', 'price': '65,00\xa0€', 'url': 'https://www.salewa.com/de-de/trekking-gamaschen-00-0000002114?c=316582&listing=1'}
2023-08-31 00:24:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.salewa.com/de-de/widgets/listing/listingCount/sCategory/316582?p=1&c=316582&part=9&o=1&n=36&loadProducts=1>
{'name': 'Fanes Regenhut mit Krempe', 'price': '40,00\xa0€', 'url': 'https://www.salewa.com/de-de/fanes-regenhut-mit-krempe-00-0000027464?c=316582&listing=1'}