Search code examples
pythonhtmlpython-3.xpython-requests-html

Scrape text under <h4> using Requests-HTML (Requests-HTML, Python)


I am attempting to extract the socket type of the cpu as you can see in the following image. I have identified that the socket type is under the <h4> Socket heading as seen in the following image.

So far I have been able to scrape .spec.block and find all <h4>'s nested inside. However I can't get the text under each heading

Here is my code

from requests_html import HTMLSession
session = HTMLSession()

r = session.get('https://au.pcpartpicker.com/product/' + jLF48d)
about = r.html.find('.specs.block')[0]
about = about.find('h4')

print(about.text)

This prints

 [ <Element 'h4' >, <Element 'h4' >, <Element 'h4' >, <Element 'h4' >,
 <Element 'h4' >, <Element 'h4' >, <Element 'h4' >, <Element 'h4' >,
 <Element 'h4' >, <Element 'h4' >, <Element 'h4' >]

However when I change the print statement to:

print(about.text)

I get the following error:

AttributeError: 'list' object has no attribute 'text'

Update:

print(about[0].text)

This code prints:

Manufacturer AMD Which is the first heading and text however I need the 4th

Any idea what code I can use to reach the desired result?

If you require any more information please let me know.


Solution

  • Replacing: print(about[0].text)

    With

    print(about[3].text)
    

    As seen on the code in my question above solved the problem for me!