I've been exploring web scraping techniques using Python and RSS feed, but I'm not sure how to narrow down the search results to a particular year on Google News. Ideally, I'd like to retrieve headlines, publication dates, and possibly summaries for news articles from a specific year (such as 2020). With the code provided below, I can scrape the current data, but if I try to look for news from a specific year, it isn't available. Even when I use the Google articles search box, the filter only shows results from the previous year. However, when I scroll down, I can see articles from 2013 and 2017. Could someone provide me with a Python script or pointers on how to resolve this problem?
Here's what I've attempted so far:
import feedparser
import pandas as pd
from datetime import datetime
class GoogleNewsFeedScraper:
def __init__(self, query):
self.query = query
def scrape_google_news_feed(self):
formatted_query = '%20'.join(self.query.split())
rss_url = f'https://news.google.com/rss/search?q={formatted_query}&hl=en-IN&gl=IN&ceid=IN%3Aen'
feed = feedparser.parse(rss_url)
titles = []
links = []
pubdates = []
if feed.entries:
for entry in feed.entries:
# Title
title = entry.title
titles.append(title)
# URL link
link = entry.link
links.append(link)
# Date
pubdate = entry.published
date_str = str(pubdate)
date_obj = datetime.strptime(date_str, "%a, %d %b %Y %H:%M:%S %Z")
formatted_date = date_obj.strftime("%Y-%m-%d")
pubdates.append(formatted_date)
else:
print("Nothing Found!")
data = {'URL link': links, 'Title': titles, 'Date': pubdates}
return data
def convert_data_to_csv(self):
d1 = self.scrape_google_news_feed()
df = pd.DataFrame(d1)
csv_name = self.query + ".csv"
csv_name_new = csv_name.replace(" ", "_")
df.to_csv(csv_name_new, index=False)
if __name__ == "__main__":
query = 'forex rate news'
scraper = GoogleNewsFeedScraper(query)
scraper.convert_data_to_csv()
You can use date filters in your rss_url. modify the query part in the below format
Format: q=query+after:yyyy-mm-dd+before:yyyy-mm-dd
The URL above returns articles related to forex rate news that were published between November 1st, 2023, and December 1st, 2023.
Please refer to this article for more information.