Complete Scrapy noob and after going through the tutorials successfully I tried to scrape a page on the website I'm collecting data for further analysis on but the xpath I'm trying to use to scrape continually returns nothing. The only thing I can tell that is different is that the text/urls are inside flex boxes on the page. I have tried every iteration that seems to work in the Elements search bar and nothing. Is there a function I'm missing to allow access to those lines inside a flexbox?
URL of the page I'm trying to scrape: https://partsmasterusa.com/product-category/crown/page/2/
Samples of xpaths I've tried in the scrapy shell that deliver no results:
response.xpath('//div[@id="main"]/div/div/div/div/main/div[@class="archive-products"]//div[@class="product-content"]/a/text()').get()
response.xpath('//div[@id="main"]/div/div/div/div/main/div[@class="archive-products"]//div[@class="product-content"]/a/text()').extract_first()
response.xpath('//li[contains(@class, "product-col")]//a[@href]/text()').extract_first()
etc etc
An example of an xpath response that returns exactly what I was expecting:
response.xpath('//div//a/span/text()').extract_first()
TIA
This site is a bit tricky. It's a wordpress site so it isn't totally dynamic but the products that are rendered on each page are loaded through ajax calls. So what you are likely trying to extract from the site doesn't exist when you are trying to extract it.
What you can do is duplicate the POST requests that the page triggers in order to load all of the products onto the page by looking in the network tab of your browsers dev tools.
After investigating you will find the it makes requests to https://partsmasterusa.com/product-category/crown/page/{page_number_here}/?count=36. By duplicating this url and the request headers and fields and values sent in the body of the request you can get all of the products and their information that way.
For example:
import scrapy
import json
class PartSpider(scrapy.Spider):
name = "partsmaster"
def start_requests(self):
url = "https://partsmasterusa.com/product-category/crown/page/{}/?count=36"
body = {"portoajax": True, "load_posts_only": True}
headers = {"X-Requested-With": "XMLHttpRequest"}
for i in range(1,542):
yield scrapy.Request(url.format(i), method="POST", body=json.dumps(body), headers=headers)
def parse(self, response):
for lnk in response.xpath("//a[@class='product-loop-title']"):
yield {"title": lnk.xpath('./h3/text()').get()
"url": lnk.xpath('./@href').get()}
partial output:
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Insulator (093603)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'KEY RING (107763)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Key Switch (146289)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Key Switch Assembly (146286)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Kit Label EEC (126271-(1))'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'LABEL (69395)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Label Contactor Component Map(869419-(2))'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Label Control Component Map(869421-(2))'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'LABEL KIT (126270)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Label Power Component Map(869420-(2))'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Label Pump Motor Map(869422-(2))'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Label Traction Drive Module Map(69444)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Label Traction Drive Module Map(869444-(1))'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Label Traction Motor Map(869423-(2))'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'LABEL-CONTACTOR CONTROL MAP (69419)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'LABEL-CONTROL COMPONENT MAP (69421)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'LABEL-POWER COMPONENT MAP (69420)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'LABEL-PUMP MOTOR MAP (69422)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'LABEL-TRACTION MOTOR MAP (69423)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Load Wheel (077086-201)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Load Wheel Assembly Includes Bearings (093656-201)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Locknut (060043–008)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'LOCKWASHER (060005-003)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Lockwasher Without Quick Coast (060005-045)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'LT SILVER GRADE ANTI-SEIZE (065005-003)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Manual Coast Selector Warning (069100)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'MODULE AC4820 FIN RR5200 (142885-001)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'MODULE VCM RR5200 AC SERVICE (129325-001)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Module Warning Label(69376)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Momentum RR Decal (069372-001.)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'MOTOR – HYDRAULIC (121659)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Motor Nameplate (021062-008)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Mount Wheel (084009)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Mount Wheel (115388)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Moving Contact (114435)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/4/?count=36>
{'title': 'Moving Vehicle Warning (069004)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': '#4 REG SPLIT LCW (060005-049)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': '1/4 INT LCW (060005-022)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': '350 BLUE HSNG (078723-006)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'A.C.-TRACTION DRIVE MODULE (130056)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'AC RR52 MOTOR 36V TRACTION (21067)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'AC RR52 MOTOR 36V TRACTION (21187)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'ACCESS 1 Display (146688)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'ACCESS 1 MODULE NEW- (140131)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'ACCESS 2 MODULE NEW- (142517-001-0S)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'ACCESS 3 Label(69375)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'ACCESS 4 MODULE NEW- (141779-001-0S)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'ACCESS 5 MODULE NEW- (143911-001-0S)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'ACCESS MODULE 2 (121611-00S)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'ArmInner Primary (12.2924-001)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'ASM HEATER RR5000 24V (129132-001)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'Axle (080191)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'Axle (116804)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'BAR BUS (130528)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'BAR BUS (130529)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'BAR BUS (130530)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'BAR BUS (130531)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'BAR BUS (130532)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'Bearing (.065081-045)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'BEARING – BALL SENSOR (130692)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'BEARING SLEEVE (130701)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'BEARING- (065081-043)'}
2023-06-09 15:41:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://partsmasterusa.com/product-category/crown/page/1/?count=36>
{'title': 'BLOCK – TERMINAL (21053)'}