Search code examples
pythonscrapycss-selectors

Scrapy : Can't find image using css selector attr::img


I am trying to scrape some elements on this page:

https://www.liberation.fr/planete/2015/10/26/stupeur-en-argentine-le-candidat-de-kirchner-en-difficulte_1408847/

I would like to scrape the link of the image in the article. Here is the part of the html where the image's link can be found:


<figure class="lead-art-wrapper"><div><div class="sc-ckMVTt hVOpns"><img src="https://www.liberation.fr/resizer/Kmpp6T1oKcLS4NfCHPYuP-bPGMk=/1024x0/filters:format(jpg):quality(70)/cloudfront-eu-central-1.images.arcpublishing.com/liberation/QGDR2IJDFAWHBV35O7NBAJONJI.jpg" width="1024px" height="0px" class="sc-GVOUr jdlgMc"></div></div><figcaption><p class="ImageMetadata__MetadataParagraph-sc-1gn0vty-0 dkGqa-d image-metadata"><span>Peu après minuit, les premiers résultats négatifs parviennent au Luna Park, stade couvert de Buenos Aires, où sont rassemblés les partisans de la présidente Cristina Kirchner.  </span>(JUAN MABROMATA/AFP)</p></figcaption></figure>

Using the scrapy shell I am not able to select the link of the image:

response.css('div.sc-ckMVTt img::attr(src)')

Even doing :

response.css('img')

I only get the logo of the website. Could you let me know how can I scrape the url of the image? I need to use CSS selector as I would like to select multiple pages and XPATH would not be convenient.

Thank you very much,


Solution

  • Your image is rendered by Javascript. You can check HTML source code (Ctrl+U) to find that above markup doesn't exist in the raw HTML. Unfortunately, Scrapy can't execute Javascript and you need to parse your image path from JSON-like object in Fusion.globalContent string.