Search code examples
pythonweb-scrapingfull-text-search

How do I return all strings containing a "£" in them?


I'm trying to web scrape a site that is badly designed and I am trying to gather the prices of items. The only thing in common with each page is that the prices all start with a "£" so I thought that if I searched through all the HTML content and returned all strings with "£" attached it would work.

I am not quite sure how to go about this. Any help is greatly appreciated.

Kind regards


Solution

  • If you just want to pull out the prices with '£' prefix then can try something like this.

    import re
    
    html = """
    cost of living is £2,232
    bottle of milk costs £1 and it goes up to £1.05 a year later...
    """
    
    print(re.findall(r"£\S+", html))
    

    Output:

    ['£2,232', '£1', '£1.05']
    

    If you want to extract the item name along with the price then the regexp will need to be modified. BeautifulSoup Python library can be used to extract info from even malformed HTML sites.