I am new to Python, I want to extract all the title/s inside < a > tags that are placed in Divs. it could have 0 title or as many as 100.
it is the child DIV <div class="Shl zI7 iyn Hsu">
that contains < a > tag and title in it.
this is the first Main DIV code that contains all child DIV in it:
<div class="Eqh F6l Jea k1A zI7 iyn Hsu"><div class="Shl zI7 iyn Hsu"><a data-test-id="search-guide"
href="" title="Search for "living room colors""><div class="Jea Lfz XiG fZz gjz qDf zI7 iyn
Hsu" style="white-space: nowrap; background-color: rgb(162, 152, 139);"><div class="tBJ dyH iFc MF7
erh tg7 IZT mWe">Living</div></div></a>
in the above example, I want to get the "living room colors" not everything in front of title=, I guess I could have some RegEx later, but I have the problem of getting the title from HTML parsing.
I have tried the following Python:
import requests
from bs4 import BeautifulSoup
url = "https://www.pinterest.com/search/pins/?q=room%20color"
get_url = requests.get(url)
get_text = get_url.text
soup = BeautifulSoup(get_text, "html.parser")
DivTitle = soup.select('a.Shl.zI7.iyn.Hsu')[0].text.strip()
print(DivTitle)
I get: IndexError: list index out of range
as I search the above keyword, there is more than one title ( suggestion keywords) that appears in the search result.
appreciate your help.
EDITED: OK, I got this working, but I am trying to make it work parsing from URL instead of pasting my code:
here is the part that I used:
import requests
vgm_url = 'https://www.pinterest.com/search/pins/?q=skin%20care'
html_text = requests.get(vgm_url).text
soup = BeautifulSoup(html_text, 'html.parser')
but I get nothing, no error either.
Your selector is wrong as the DIV has the classes you want and the A is a child of that DIV. title
is an attribute of the A element.
from bs4 import BeautifulSoup
data = '''\
<html>
<head>
<meta name="generator"
content="HTML Tidy for HTML5 (experimental) for Windows https://github.com/w3c/tidy-html5/tree/c63cc39" />
<title></title>
</head>
<body>
<div class="Eqh F6l Jea k1A zI7 iyn Hsu">
<div class="Shl zI7 iyn Hsu">
<a data-test-id="search-guide" href="" title="Search for "living room colors"">
<div class="Jea Lfz XiG fZz gjz qDf zI7 iyn Hsu" style="white-space: nowrap; background-color: rgb(162, 152, 139);">
<div class="tBJ dyH iFc MF7 erh tg7 IZT mWe">Living</div>
</div>
</a>
</div>
</div>
</body>
</html>
'''
soup = BeautifulSoup(data, 'html.parser')
a = soup.select('div.Shl.zI7.iyn.Hsu a')[0]
print(a['title'])