I've been trying for hours to find the right soup.select_one or find_next combination to find the zestimate tag below. Can you help find with this soup code?
here's the url:
https://www.zillow.com/homedetails/8612-Silverthorne-St-Austin-TX-78744/251036192_zpid/
I'm trying to return:
$486,997
<div id="home-details-home-values">
<h2>Home Value</h2>
<div class="zestimate-summary">
<div class="zsg-content-component zestimate-above-toggle">
<div class="primary-zestimate-item">
<div>
<div class="title zsg-h3 zsg-content_collapsed"><span tabindex="0" role="button"><span class="ds-dashed-underline">Zestimate</span></span></div>
<div class="content">
<div class="zestimate-value">$486,997</div>
</div>
</div>
<div class="left-spacer"></div>
<div class="right-spacer"></div>
<div class="zillow-offers-upsell-wrapper">
<div class="sc-kgoBCf pnJxW">
<div class="zsg-h3 zsg-content_collapsed">Zillow Offer</div>
<a href="/offers/?t=omhdp-zestimate&zpid=251036192">Get your Zillow Offer</a>
</div>
</div>
</div>
<div class="secondary-zestimate-items">
<div class="zsg-lg-1-3 zsg-md-1-1 secondary-row">
<span class="zestimate-icon"><img src="data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iNTYiIGhlaWdodD0iNTYiIHZpZXdCb3g9IjAgMCA1NiA1NiIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIiB4bWxuczp4bGluaz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94bGluayI+PHRpdGxlPlplc3RpbWF0ZV9SYW5nZTwvdGl0bGU+PGRlZnM+PGVsbGlwc2UgaWQ9ImEiIGN4PSIyOCIgY3k9IjI4IiByeD0iMjgiIHJ5PSIyOCIvPjxtYXNrIGlkPSJjIiB4PSIwIiB5PSIwIiB3aWR0aD0iNTYiIGhlaWdodD0iNTYiIGZpbGw9IiNmZmYiPjx1c2UgeGxpbms6aHJlZj0iI2EiLz48L21hc2s+PHBhdGggZD0iTTIzLjgwNCAxMy41MDF2MTAuNTExYzAgLjY0OC0uMzI1IDEuNTEyLTEuNTEzIDEuNTEyaC01Ljk0VjE0Ljc2MmgtNS45NHYxMC43NjJINC40N2MtMS4xODggMC0xLjUxMi0uODY0LTEuNTEyLTEuNTEydi0xMC41MUguNThjLS44NjQgMC0uNjQ4LS40MzMtLjEwOC0xLjA4TDEyLjM1NC40MzFjLjMyNC0uMzI0LjY0OS0uNDMyIDEuMDgtLjQzMi40MzMgMCAuNzU3LjIxNiAxLjA4LjQzMmwxMS44ODIgMTEuOTljLjY0OC42NDcuODY0IDEuMDgtLjEwOCAxLjA4aC0yLjQ4NHoiIGlkPSJiIi8+PG1hc2sgaWQ9ImQiIHg9IjAiIHk9IjAiIHdpZHRoPSIyNi45NSIgaGVpZ2h0PSIyNS41MjQiIGZpbGw9IiNmZmYiPjx1c2UgeGxpbms6aHJlZj0iI2IiLz48L21hc2s+PC9kZWZzPjxnIHN0cm9rZT0iIzAwNzRFNCIgc3Ryb2tlLXdpZHRoPSIyIiBmaWxsPSIjRkZGIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiPjx1c2UgbWFzaz0idXJsKCNjKSIgeGxpbms6aHJlZj0iI2EiLz48dXNlIG1hc2s9InVybCgjZCkiIHhsaW5rOmhyZWY9IiNiIiB0cmFuc2Zvcm09InRyYW5zbGF0ZSgxNSAxNSkiLz48L2c+PC9zdmc+" role="presentation"></span>
<div class="secondary-wrapper">
<div class="title zsg-h4 zsg-content_collapsed"><span tabindex="0" role="button"><span class="ds-dashed-underline">Zestimate Range</span></span></div>
<div class="content">$463,000 - $511,000</div>
</div>
</div>
<div class="zsg-lg-1-3 zsg-md-1-1 secondary-row">
<span class="zestimate-icon"><img src="data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iNTYiIGhlaWdodD0iNTYiIHZpZXdCb3g9IjAgMCA1NiA1NiIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIiB4bWxuczp4bGluaz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94bGluayI+PHRpdGxlPjMwX0RheXNfRG93bjwvdGl0bGU+PGRlZnM+PGVsbGlwc2UgaWQ9ImEiIGN4PSIyOCIgY3k9IjI4IiByeD0iMjgiIHJ5PSIyOCIvPjxtYXNrIGlkPSJjIiB4PSIwIiB5PSIwIiB3aWR0aD0iNTYiIGhlaWdodD0iNTYiIGZpbGw9IiNmZmYiPjx1c2UgeGxpbms6aHJlZj0iI2EiLz48L21hc2s+PHBhdGggZD0iTTI4LjcwNiAxMy43NjVMMTYuNDcgMS41MjlDMTYgMS4wNiAxNS40MS44MjQgMTQuNzA2LjgyNGMtLjcwNiAwLTEuMjk0LjIzNS0xLjY0Ny43MDVMLjcwNiAxMy43NjVjLS40Ny40Ny0uNzA2IDEuMDU5LS43MDYgMS43NjQgMCAuNzA2LjIzNSAxLjE3Ny43MDYgMS42NDdsMS40MTIgMS40MTJjLjQ3LjQ3IDEuMDU4LjcwNiAxLjY0Ny43MDYuNzA2IDAgMS4yOTQtLjIzNSAxLjY0Ny0uNzA2bDUuNTMtNS41M3YxMy4yOTVjMCAuNzA2LjIzNCAxLjE3Ni43MDUgMS42NDdhMi44OSAyLjg5IDAgMCAwIDEuNzY1LjU4OGgyLjQ3QTIuODkgMi44OSAwIDAgMCAxNy42NDcgMjhjLjQ3LS4zNTMuNzA2LS45NDEuNzA2LTEuNjQ3VjEzLjA1OWw1LjUzIDUuNTNjLjQ3LjQ3IDEuMDU4LjcwNSAxLjY0Ni43MDUuNzA2IDAgMS4yOTUtLjIzNSAxLjc2NS0uNzA2bDEuNDEyLTEuNDEyYy40Ny0uNDcuNzA2LTEuMDU4LjcwNi0xLjY0NyAwLS43MDUtLjIzNi0xLjI5NC0uNzA2LTEuNzY0eiIgaWQ9ImIiLz48bWFzayBpZD0iZCIgeD0iMCIgeT0iMCIgd2lkdGg9IjI5LjQxMiIgaGVpZ2h0PSIyNy43NjUiIGZpbGw9IiNmZmYiPjx1c2UgeGxpbms6aHJlZj0iI2IiLz48L21hc2s+PC9kZWZzPjxnIHN0cm9rZT0iIzAwNzRFNCIgc3Ryb2tlLXdpZHRoPSIyIiBmaWxsPSIjRkZGIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiPjx1c2UgbWFzaz0idXJsKCNjKSIgeGxpbms6aHJlZj0iI2EiLz48dXNlIG1hc2s9InVybCgjZCkiIHhsaW5rOmhyZWY9IiNiIiB0cmFuc2Zvcm09Im1hdHJpeCgxIDAgMCAtMSAxMyA0MykiLz48L2c+PC9zdmc+" role="presentation"></span>
<div class="secondary-wrapper">
<div class="title zsg-h4 zsg-content_collapsed">Last 30 Day Change</div>
<div class="content">-$2,830 <span class="percent-decrease">(-0.6 %)</span></div>
</div>
</div>
</div>
</div>
<div class="toggle-section">
<div class="zsg-content-component module-separator hide">
<div class="additional-zestimate-info zsg-wrapper-body-hidden"></div>
</div>
<div class="zsg-content-item"><a class="toggle zsg-lg-1-1 zsg-centered">Zestimate history & details <span class="zsg-icon-expando-down"></span></a></div>
</div>
</div>
</div>
here's the code i'm working with:
req_headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.8',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
}
for link in df['links']:
r = s.get(link, headers=req_headers)
soup = BeautifulSoup(r.content, 'html.parser')
# soup = BeautifulSoup(requests.get(url, headers=req_headers).content, 'html.parser')
results = soup.select_one('h4:contains("Home value")').find_next('p').get_text(strip=True)
print(results)
Based on my answer: It seems there are more types of pages that Zillow serves to user. First check, if you don't get captcha page. If not, then use this script:
import requests
from bs4 import BeautifulSoup
url = 'https://www.zillow.com/homedetails/8612-Silverthorne-St-Austin-TX-78744/251036192_zpid/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
home_value = soup.select_one('h4:contains("Home value")')
if not home_value:
home_value = soup.select_one('.zestimate').text.split()[-1]
else:
home_value = home_value.find_next('p').get_text(strip=True)
print(home_value)
Prints:
$486,997
For url = 'https://www.zillow.com/homedetails/1404-Clearwing-Cir-Georgetown-TX-78626/121721750_zpid/'
it prints:
$324,493
Probably more testing is needed.