Search code examples
htmlcssscrapyhref

Scrapy - extract href from link with specific attribute value


I'm working with Scrapy. I have a list of a-carousel-card where I'm tring to extract the href from the second element. The following code only extracts the first link it finds. The only difference between these cards is a aria-posinset attribute with the values "1", "2", etc.

response.css("li.a-carousel-card a::attr(href)").extract_first()

I'm very unsure about how I extract the href from the second element in the list. Something like response.css("li.a-carousel-card a[aria-posinset="2"] a::attr(href)").extract_first(), but this gives me a syntax error at "2".

The first element is

<li class="a-carousel-card a-float-left" role="listitem" aria-setsize="100" aria-posinset="1" aria-hidden="false" style="margin-left: 14px;">,

while the other is

<li class="a-carousel-card a-float-left" role="listitem" aria-setsize="100" aria-posinset="2" aria-hidden="false" style="margin-left: 14px;">

The only difference between the two are the value in aria-posinset: "1" and "2".

How would I accomplish this?


Solution

  • You either need to escape the double quotes inside string or you need to use single quotes. You need to use below

    response.css("li.a-carousel-card[aria-posinset='2'] a::attr(href)").extract_first()