The python code below is to extract from html specific data and it works for just one instance contained within the html.
What I need ia code to iterate through an html with several instances and retrieve the specific information. So, how could I achieve that?
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<title>Exported Data</title>
<meta content="width=device-width, initial-scale=1.0" name="viewport"/>
<link href="css/style.css" rel="stylesheet"/>
<script src="js/script.js" type="text/javascript">
</script>
</head>
<body onload="CheckLocation();">
<div class="page_wrap">
<div class="page_header">
<div class="content">
<div class="text bold">
π€π₯ π¬πππ π©ππ - πΆπππ 2.5
</div>
</div>
</div>
<div class="page_body chat_page">
<div class="history">
<div class="message service" id="message-1">
<div class="body details">
9 March 2023
</div>
</div>
<div class="message default clearfix" id="message3984">
<div class="pull_left userpic_wrap">
<div class="userpic userpic2" style="width: 42px; height: 42px">
<div class="initials" style="line-height: 42px">
?
</div>
</div>
</div>
<div class="body">
<div class="pull_right date details" title="09.03.2023 00:27:10 UTC-03:00">
00:27
</div>
<div class="from_name">
π€π₯ π¬πππ π©ππ - πΆπππ 2.5
</div>
<div class="text">
Easy Bot - Over 2.5<br><br>π Liga: Premiership<br>π¦ Entrada: Over 2.5 FT<br>β½ Jogos: β
03:30 03:33 03:36 ( 03:39)<br><br><strong>Link: </strong><a href="https://www.bet365.com/#/AVR/B146/R%5E1/">https://www.bet365.com/#/AVR/B146/R%5E1/</a><br><br>π 24h:100% de acerto nas ΓΊltimas 24h<br><br>β
β
β
β
β
β
.
</div>
</div>
</div>
<div class="message default clearfix" id="message3985">
<div class="pull_left userpic_wrap">
<div class="userpic userpic2" style="width: 42px; height: 42px">
<div class="initials" style="line-height: 42px">
?
</div>
</div>
</div>
<div class="body">
<div class="pull_right date details" title="09.03.2023 00:45:16 UTC-03:00">
00:45
</div>
<div class="from_name">
π€π₯ π¬πππ π©ππ - πΆπππ 2.5
</div>
<div class="text">
Easy Bot - Over 2.5<br><br>π Liga: Premiership<br>π¦ Entrada: Over 2.5 FT<br>β½ Jogos: β
03:48 03:51 03:54 ( 03:57)<br><br><strong>Link: </strong><a href="https://www.bet365.com/#/AVR/B146/R%5E1/">https://www.bet365.com/#/AVR/B146/R%5E1/</a><br><br>π 24h:100% de acerto nas ΓΊltimas 24h<br><br>β
β
β
β
β
β
.
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>
Well this one is somewhat more complex than in your previous question, so you need more acrobatics:
for b in soup.select('div[class="body"]'):
d_str = b.select_one('div.date.details')['title']
calendar = d_str.split(" ")
print("Date: ",calendar[0])
print("Time: ",calendar[1])
targets = b.select('div.text')
for target in targets:
for sts in target.stripped_strings:
if "β½ Jogos: " in sts:
jugos = [elem for elem in sts.split('β½ Jogos: ')[1].replace('( ',"(").split(" ") if elem]
if "β
" in jugos:
ind = jugos.index('β
')+1
print("Checkmarked: ", ind)
jugos.remove("β
")
print(jugos)
else:
print(jugos)
print("Checkmarked: NA")
print('------------------------------------')
Output:
Date: 09.03.2023
Time: 00:27:10
Checkmarked: 1
['03:30', '03:33', '03:36', '(03:39)']
------------------------------------
Date: 09.03.2023
Time: 00:45:16
Checkmarked: 1
['03:48', '03:51', '03:54', '(03:57)']