I have this code:
from pyquery import PyQuery as pq
import requests
url = "https://www.mba.org/news-and-research/forecasts-and-commentary"
content = requests.get(url).content
doc = pq(content)
Latest_Report_MO = doc("#ContentPlaceholder_C012_Col01")
print(Latest_Report_MO)
I get this result:
<div id="ContentPlaceholder_C012_Col01" class="sf_colsIn grid__unit grid__unit--1-3-l" data-sf-element="Column 2" data-placeholder-label="Column 2"> <div>
<div class="sfContentBlock sf-Long-text"><a target="_blank" href="/docs/default-source/research-and-forecasts/historical-mortgage-origination-estimates.xlsx?sfvrsn=8c6933cb_5"/><a style="margin-bottom:20px;" href="/docs/default-source/research-and-forecasts/forecasts/2023/historical-mortgage-origination-estimates.xlsx?sfvrsn=a7595901_1"/><a href="/docs/default-source/research-and-forecasts/historical-mortgage-origination-estimates.xlsx?sfvrsn=8c6933cb_5"/><a href="/docs/default-source/research-and-forecasts/historical-mortgage-origination-estimates.xlsx?sfvrsn=8c6933cb_5"/><a href="/docs/default-source/research-and-forecasts/historical-mortgage-origination-estimates.xlsx?sfvrsn=8c6933cb_5"><img src="/images/default-source/research/20125-research-forecast-web-button-qoe.png?sfvrsn=e73fc287_0" alt="" sf-size="66661"/></a> <p>Historical record of single-family, one- to four-unit loan origination estimates. Last updated June 2023. </p></div> </div>
</div>
I am interested in the href="/docs/default-source/research-and-forecasts/historical-mortgage-origination-estimates.xlsx?sfvrsn=8c6933cb_5"
How do I use the .attr()
to extract this URL? Or is there any other method?
Here you can go with doc("#ContentPlaceholder_C012_Col01 .sfContentBlock a[target='_blank']")
from pyquery import PyQuery as pq
import requests
url = "https://www.mba.org/news-and-research/forecasts-and-commentary"
content = requests.get(url).content
doc = pq(content)
items = doc("#ContentPlaceholder_C012_Col01 .sfContentBlock a[target='_blank']")
print(pq(items[0]).attr('href'))