I'm trying to parse a webpage filled with tables using python3.8.7
and BeautifulSoup4.9.3
so I can display it on a telegram channel. I can get all the tables necessary from the webpage but deep inside those tables there are td
tags that contain img
tags with src
of a star that need to be replace with a p
tag. These is the code thus far:
import pickle
import bs4 as bs
v_file = open('data/pickled_data/pickled_v', 'rb')
v_pickled = pickle.load(v_file)
v_soup = bs.BeautifulSoup(v_pickled.content, "html5lib")
all_tbls = v_soup.find_all('table')
I've tried replacing the image -- aka star_image
-- as below, but it returns AttributeError: 'NoneType' object has no attribute 'replace_with'
:
url_2_check = "https://i.imgur.com/ffIvqVj.png"
for table in all_tbls:
for tr in table.find_all('tr'):
for td in table.find_all('td'):
for star_image in td.find_all('img'):
if star_image['src'] == url_2_check:
p_tag = v_soup.new_tag('p')
p_tag.string = ":star:"
td.star_image.replace_with(p_tag)
Then I tried it as below, but it returns ValueError: Cannot replace one element with another when the element to be replaced is not part of a tree
:
for table in all_tbls:
for tr in table.find_all('tr'):
for td in table.find_all('td'):
for star_image in td.find_all('img'):
if star_image['src'] == url_2_check:
p_tag = v_soup.new_tag('p')
p_tag.string = ":star:"
td.replace_with(p_tag)
I can't seem to be able to figure out what I'm doing wrong, can anybody please help?
Thank you.
To parse the data from table, you can use following example:
import requests
from bs4 import BeautifulSoup
url = "https://xxviptips.blogspot.com/"
soup = BeautifulSoup(requests.get(url).content, "lxml")
for t in soup.select("table"):
rows = t.select("tr")
league = rows[0].get_text(strip=True)
match, info1, info2 = [
td.get_text(strip=True) for td in rows[1].select("td")
]
rate_star = " ".join([":star:"] * len(rows[2].select("img")))
print(
"{:<35} {:<35} {:<10} {:<10} {:<50}".format(
league, match, info1, info2, rate_star
)
)
Prints:
Eng. Premier League - 19:00 GMT Fulham - Burnley under 3.5 1.30 :star: :star: :star: :star:
Spanish Liga Primera- 19:00 GMT Betis - Granada CF over 1.5 1.30 :star: :star: :star:
Spanish Liga Segunda- 19:00 GMT Gijon - Lugo under 2.5 1.44 :star: :star: :star:
Spanish Liga Segunda- 17:00 GMT Rayo Vallecano - Leganes DC - 1/X 1.30 :star: :star: :star:
German Bundesliga 2- 16:00 GMT Holstein Kiel - Hannover DC - 1/X 1.25 :star: :star: :star: :star:
German Bundesliga 2- 18:30 GMT Hamburger SV - Nurnberg under 4.5 1.22 :star: :star: :star: :star:
Italian Serie B - 12:00 GMT Pescara - Salernitana away win 1.27 :star: :star: :star:
Italian Serie B- 12:00 GMT Empoli - Lecce DC - 1/X 1.31 :star: :star: :star: :star:
Romanian Liga 1- 18:30 GMT FCSB - FC Clinceni home win 1.27 :star: :star: :star: :star:
Romanian Liga 1- 13:45 GMT FC Voluntari - Chindia Targoviste under 2.5 1.38 :star: :star: :star:
Portuguese Prim. Liga- 19:15 GMT Porto - Sporting Farense home win 1.33 :star: :star: :star: :star:
Portuguese Prim. Liga - 17:00 GMT Portimonense - Moreirense under 3.5 1.25 :star: :star: :star: :star: