Search code examples
pythonpython-3.xbeautifulsouphtml-parsing

How to replace an ```img``` tag inside a ```td``` tag using BeautifulSoup?


I'm trying to parse a webpage filled with tables using python3.8.7 and BeautifulSoup4.9.3 so I can display it on a telegram channel. I can get all the tables necessary from the webpage but deep inside those tables there are td tags that contain img tags with src of a star that need to be replace with a p tag. These is the code thus far:

import pickle
import bs4 as bs

v_file = open('data/pickled_data/pickled_v', 'rb')
v_pickled = pickle.load(v_file)

v_soup = bs.BeautifulSoup(v_pickled.content, "html5lib")
all_tbls = v_soup.find_all('table')

I've tried replacing the image -- aka star_image -- as below, but it returns AttributeError: 'NoneType' object has no attribute 'replace_with':

url_2_check = "https://i.imgur.com/ffIvqVj.png"

for table in all_tbls:
    for tr in table.find_all('tr'):
        for td in table.find_all('td'):
            for star_image in td.find_all('img'):
                if star_image['src'] == url_2_check:
                    p_tag = v_soup.new_tag('p')
                    p_tag.string = ":star:"
                    td.star_image.replace_with(p_tag)

Then I tried it as below, but it returns ValueError: Cannot replace one element with another when the element to be replaced is not part of a tree:

for table in all_tbls:
    for tr in table.find_all('tr'):
        for td in table.find_all('td'):
            for star_image in td.find_all('img'):
                if star_image['src'] == url_2_check:
                    p_tag = v_soup.new_tag('p')
                    p_tag.string = ":star:"
                    td.replace_with(p_tag)

I can't seem to be able to figure out what I'm doing wrong, can anybody please help?

Thank you.


Solution

  • To parse the data from table, you can use following example:

    import requests
    from bs4 import BeautifulSoup
    
    
    url = "https://xxviptips.blogspot.com/"
    soup = BeautifulSoup(requests.get(url).content, "lxml")
    
    for t in soup.select("table"):
        rows = t.select("tr")
        league = rows[0].get_text(strip=True)
        match, info1, info2 = [
            td.get_text(strip=True) for td in rows[1].select("td")
        ]
        rate_star = " ".join([":star:"] * len(rows[2].select("img")))
    
        print(
            "{:<35} {:<35} {:<10} {:<10} {:<50}".format(
                league, match, info1, info2, rate_star
            )
        )
    

    Prints:

    Eng. Premier League - 19:00 GMT     Fulham - Burnley                    under 3.5  1.30       :star: :star: :star: :star:                       
    Spanish Liga Primera- 19:00 GMT     Betis - Granada CF                  over 1.5   1.30       :star: :star: :star:                              
    Spanish Liga Segunda- 19:00 GMT     Gijon - Lugo                        under 2.5  1.44       :star: :star: :star:                              
    Spanish Liga Segunda- 17:00 GMT     Rayo Vallecano - Leganes            DC - 1/X   1.30       :star: :star: :star:                              
    German Bundesliga 2- 16:00 GMT      Holstein Kiel - Hannover            DC - 1/X   1.25       :star: :star: :star: :star:                       
    German Bundesliga 2- 18:30 GMT      Hamburger SV - Nurnberg             under 4.5  1.22       :star: :star: :star: :star:                       
    Italian Serie B - 12:00 GMT         Pescara - Salernitana               away win   1.27       :star: :star: :star:                              
    Italian Serie B- 12:00 GMT          Empoli - Lecce                      DC - 1/X   1.31       :star: :star: :star: :star:                       
    Romanian Liga 1- 18:30 GMT          FCSB - FC Clinceni                  home win   1.27       :star: :star: :star: :star:                       
    Romanian Liga 1- 13:45 GMT          FC Voluntari - Chindia Targoviste   under 2.5  1.38       :star: :star: :star:                              
    Portuguese Prim. Liga- 19:15 GMT    Porto - Sporting Farense            home win   1.33       :star: :star: :star: :star:                       
    Portuguese Prim. Liga - 17:00 GMT   Portimonense - Moreirense           under 3.5  1.25       :star: :star: :star: :star: