I'm trying to scrape some real estate articles from the following website:
I manage to get the links I need,but I am struggling with pagination on the web page.I'm trying to scrape every link under each category 'building relationships', 'building your team', 'capital rising' etc.Some of these categories pages have pagination and some of them do not contain pagination.I tried with the following code but it just gives me the links from 2 page.
from requests_html import HTMLSession
def tag_words_links(url):
global _session
_request = _session.get(url)
tags = _request.html.find('a.tag-cloud-link')
links = []
for link in tags:
links.append({
'Tags': link.find('a', first=True).text,
'Links': link.find('a', first=True).attrs['href']
})
return links
def parse_tag_links(link):
global _session
_request = _session.get(link)
articles = []
try:
next_page = _request.html.find('link[rel="next"]', first=True).attrs['href']
_request = _session.get(next_page)
article_links = _request.html.find('h3 a')
for article in article_links:
articles.append(article.find('a', first=True).attrs['href'])
except:
_request = _session.get(link)
article_links = _request.html.find('h3 a')
for article in article_links:
articles.append(article.find('a', first=True).attrs['href'])
return articles
if __name__ == '__main__':
_session = HTMLSession()
url = 'https://lifebridgecapital.com/podcast/'
links = tag_words_links(url)
print(parse_tag_links('https://lifebridgecapital.com/tag/multifamily/'))
To print title of every article under each tag and each page under the tag you can use this example:
import requests
from bs4 import BeautifulSoup
url = "https://lifebridgecapital.com/podcast/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
tag_links = [a["href"] for a in soup.select(".tagcloud a")]
for link in tag_links:
while True:
print(link)
print("-" * 80)
soup = BeautifulSoup(requests.get(link).content, "html.parser")
for title in soup.select("h3 a"):
print(title.text)
print()
next_link = soup.select_one("a.next")
if not next_link:
break
link = next_link["href"]
Prints:
...
https://lifebridgecapital.com/tag/multifamily/
--------------------------------------------------------------------------------
WS890: Successful Asset Classes In The Current Market with Jerome Maldonado
WS889: How To Avoid A $1,000,000 Mistake with Hugh Odom
WS888: Value-Based On BRRRR VS Cap Rate with John Stoeber
WS887: Slow And Steady Still Wins The Race with Nicole Pendergrass
WS287: Increase Your NOI by Converting Units to Short Term Rentals with Michael Sjogren
WS271: Investment Strategies To Survive An Economic Downturn with Vinney Chopra
WS270: Owning a Construction Company Creates More Value with Abraham Ng’hwani
WS269: The Impacts of Your First Deal with Kyle Mitchell
WS260: Structuring Deals To Get The Best Return On Investment with Jeff Greenberg
WS259: Capital Raising For Newbies with Bryan Taylor
https://lifebridgecapital.com/tag/multifamily/page/2/
--------------------------------------------------------------------------------
WS257: Why Ground Up Development is the Best Investment with Sam Bates
WS256: Mobile Home Park Investing: The Real Deal with Jefferson Lilly
WS249: Managing Real Estate Paperwork Successfully with Krista Testani
WS245: Multifamily Syndication with Venkat Avasarala
WS244: Passive Investing In Real Estate with Kay Kay Singh
WS243: Getting Started In Real Estate Brokerage with Tyler Chesser
WS213: Data Analytics In Real Estate with Raj Tekchandani
WS202: Ben Leybovich and Sam Grooms on The Advantages Of A Partnership In Real Estate Business
WS199: Financial Freedom Through Real Estate Investing with Rodney Miller
WS197: Loan Qualifications: How The Whole Process Works with Vinney Chopra
https://lifebridgecapital.com/tag/multifamily/page/3/
--------------------------------------------------------------------------------
WS172: Real Estate Syndication with Kyle Jones
...