Search code examples
scrapycss-selectors

Get specific text from 2 paragraphs in the same class using Scrapy


I'm very new to Scrapy and I want to be able to extract both texts paragraph using Scrapy shell: "Fintec, Cybersecurity" and "Serie C"

HTML

If I run

response.css('div.card-body p.card-text strong::text').get()

I get 'Secteur' but I'm looking for 'Fintec, Cybersecurity'.

for

response.css('div.card-body p.card-text::text').get() 

I get '/n'

I've noticed if I use

response.css('div.card-body p.card-text:nth-child(3)').get() 

I get < p class="card-text">\nRound : Série C\n < /p> and for

response.css('div.card-body p.card-text:nth-child(2)').get()

I get

< p class="card-text">\nSecteur : Fintech, Cybersecurity\n < / p>

How do I get Serie C and Fintech Cybersecurity?

Thank you


Solution

  • This should work... 'div.card-body p.card-text::text' you just need to use either the getall or extract methods.

    Here is an example I did in ipython:

    In [3]: html = '''<div class="card-body">
       ...:     <h3 class="card-title mb-1">L</h3>
       ...:     <p class="card-text">
       ...:         <strong>Secteur</strong>
       ...:         " : Fintech, Cybersecurity "
       ...:     </p>
       ...:     <p class="card-text">
       ...:         <strong>Round</strong>
       ...:         " : Serie C "
       ...:     </p>
       ...:     <p class="card-text">
       ...:         <small class="text-muted"> 2820 votes enregistres </small>
       ...:     </p>
       ...: </div>'''
    
    In [4]: response = parsel.Selector(html)
    
    In [5]: for p in response.css('div.card-body p.card-text::text').getall():
       ...:     text=''.join(p).strip()
       ...:     print(text)
       ...:
    
    " : Fintech, Cybersecurity "
    
    " : Serie C "