Search code examples
pythonhtml

How to read text off a website using python (Simple explanation)


I'm looking to make a program that can get the text off a website when given the website's URL. I would like to be able to get all text between the

tags. Everywhere I have looked online seems to overcomplicate this and it involves some coding in C which I am not well versed in. To summarize what I would like the code to look like (best case scenario). If theres anything I can clarify or is unclear in the question please let me know in comments

import WebReader as WR

StringOfWebText = WR.getParagrahText("WebsiteURL")


Solution

  • You probably want to look into something like BeautifulSoup paired with requests. You can then extract text from a page with a simple solution like this:

    import requests
    from bs4 import BeautifulSoup
    
    r = requests.get("https://google.com")
    soup = BeautifulSoup(r.text, "html.parser")
    print(soup.text)
    

    There's also tag-searching and other useful features built into BS4, if you need to be able to handle that.