Search code examples
pythonweb-scrapingbeautifulsouphtml-parsing

Beautifulsoup how to extract paragraph from this page perfectly? only paragraph


I am unable to get the text inside the p tags i want text of all the p tags, have tried this so far but unable to exact text.

import requests
from bs4 import BeautifulSoup

    link = 'https://trumpwhitehouse.archives.gov/briefings
statements/remarks-president-trump-farewell-address-nation/'

page = requests.get(link)
soup = BeautifulSoup(page.content,'lxml')
article= soup.findAll('p')
print(article)

i am getting many p tags within my code how to remove those tags ? here is my output

 <p>The White House</p>, <p>THE PRESIDENT: My fellow Americans:
    and Four years ago, we launched a.<p>, and to restore the allegiance this government to its citizens. In short, we embarked on a to make America all Americans.</p>, <p>As I conclude my term asthe 45th
of the United States, I  — and so much more.</p>, <p>This week, and pray for its our best wishes, and we also want  — a very important word.</p>

Solution

  • res=requests.get(r"https://trumpwhitehouse.archives.gov/briefings-statements/remarks-president-trump-farewell-address-nation/")
    soup=BeautifulSoup(res.text,"html.parser")
    data=soup.find("div",class_="page-content").find_all("p")
    for d in data:
        print(d.get_text())
    

    Output:

    The White House
    THE PRESIDENT: My fellow Americans: Four years ago, we launched a great national effort to rebuild our country, to renew its spirit, and to restore the allegiance of this government to its citizens. In short, we embarked on a mission to make America great again — for all Americans.
    ....