I am coding a program that will get pull the top news headlines from google news. It is supposed to be printing the headline and the link for the article. But, it wont print the link.
import bs4
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
news_url="https://news.google.com/news/rss"
Client=urlopen(news_url)
xml_page=Client.read()
Client.close()
soup_page=soup(xml_page,"lxml")
news_list=soup_page.findAll("item")
# Print news title, url and publish date
for news in news_list:
print(news.title.text)
print(news.link.text)
print("-"*10)
This is an example of an output line
Following Falcon 9 Saturday launch, CRS-17 Dragon arrives at the ISS
----------
it is supposed to print the headline and the link. but it is only printing the headline
This html has a strange structure, but if you change the for
loop in your code to this:
for news in news_list:
link = news.select_one('title')
print(link.text)
print(link.next_sibling.next_sibling)
print("-"*10)
You should get the headline with the link.