i am trying to parse a web page using beautiful soup [for the first time in my life] and i am experiencing a strange error. there is a tag within a tag in html structure, and i keep getting the error
AttributeError: 'NoneType' object has no attribute 'text'
the structure of html tag is following:
the whole grid of items on the page is within div class "properties_reviews" which then goes into div class "preview" for a particular item and that class "preview" has two more classes: "preview-media" for photo and "preview-content" for text info i need to parse. the class "preview-content" has [a]
tag that contains two [span]
tags with price and square of the item, and a [h2]
tag with a territory i also need.
<div class="properties-previews">
<div class="preview"
<div class="preview-media">
<div class="preview-content">
<a href="/properties/1042-us-highway-1-hancock-me-04634/1330428"
class="preview__link">
<span class="preview__price">$89,900</span>
<span class="preview__size">1 ac</span>
<div class="preview__subtitle">
<h2 class="-g-truncated preview__subterritory">Hancock County
</h2>
<span class="preview__extended">-- sq ft</span>
</div>
</a>
so i am trying to get out $89,990 from preview_price
; 1 ac from preview_size
; hancock county from preview_subtitle
and my python code so far has been something like this (i have omitted all imports and requests):
landplots = soup.find_all('div', class_ = 'properties-previews')
for l in landplots:
plot_price = l.find('span', {"class": 'preview_price'})
plot_square = l.find('span', {"class": 'preview_size'})
plot_county = l.find('h2', class_ = '-g-truncated preview__subterritory').text
plot_location = l.find('span', class_ = 'preview__locality -g-truncated').text
print(plot_price).text
print(plot_county)
what am i doing wrong? i've come to understanding that once a tag is within another tag there should be some special syntax to get those words, but the error saying i have no text at all (on both prints i am doing) confuses me a lot. please help!
Each value is under a text node. So you can invoke .find_next(text=True)
to extract the desired data items
html='''
<div class="properties-previews">
<div <div="" class="preview-media">
<div class="preview-content">
<a class="preview__link" href="/properties/1042-us-highway-1-hancock-me-04634/1330428">
<span class="preview__price">
$89,900
</span>
<span class="preview__size">
1 ac
</span>
<div class="preview__subtitle">
<h2 class="-g-truncated preview__subterritory">
Hancock County
</h2>
<span class="preview__extended">
-- sq ft
</span>
</div>
</a>
</div>
</div>
</div>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
#print(soup.prettify())
landplots = soup.find_all('div', class_ = 'preview-content')#.find_all('div',class_="preview-media")
for l in landplots:
plot_price = l.find('span', {"class": 'preview__price'}).find_next(text=True).get_text(strip=True)
plot_square = l.find('span', {"class": 'preview__size'}).find_next(text=True).get_text(strip=True)
plot_county = l.find('h2', class_ = '-g-truncated preview__subterritory').find_next(text=True).text
print(plot_price)
print( plot_square)
Output:
$89,900
1 ac
Update: It's working fine without any issues according html dom
import requests
from bs4 import BeautifulSoup
url='https://www.landsearch.com/industrial/united-states/p1'
res= requests.get(url)
soup = BeautifulSoup(res.content,'lxml')
landplots = soup.find_all('div', class_ = 'preview-content')#.find_all('div',class_="preview-media")
for l in landplots:
plot_price = l.find('span', {"class": 'preview__price'}).find_next(text=True).get_text(strip=True)
plot_square = l.find('span', {"class": 'preview__size'}).find_next(text=True).get_text(strip=True)
plot_county = l.find('h2', class_ = '-g-truncated preview__subterritory').find_next(text=True).text
print(plot_price)
print( plot_square)
Output:
$89,900
1 ac
$995,000
2.32 ac
$85,000
0.93 ac
$888,000
11 ac
$599,000
21.6 ac
$225,000
3.72 ac
$100,000
6.5 ac
$75,000
4.48 ac
$749,000
8.2 ac
$225,000
84.5 ac
$225,000
84.5 ac
$275,000
29 ac
$275,000
29 ac
$40,000
0.22 ac
$2,330,000
2.8 ac
$535,000
3.71 ac
$169,900
34 ac
$499,000
1 ac
$299,000
2.53 ac
$299,000
2.53 ac
$299,000
2.53 ac
$799,000
2 ac
$199,000
0.79 ac
$997,600
3.27 ac
$699,000
1.71 ac
$529,000
1 ac
$499,900
1 ac
$50,000
1.14 ac
$250,000
55 ac
$50,000
1.14 ac
$11,000,000
31.4 ac
$1,200,000
1.68 ac
$94,900
85 ac
$896,000
2.38 ac
$189,000
1 ac