Let's assume we have code:
<tr class=" " somethingc1="" somethingc2="" somethingc3="" data-something="1" something="1something4" something_id="6something7">
<td class="text-center td_something">
<div>
<span doo="true" class="foo" style="left:70%;z-index:99;">
<span doo="true" class="foo" style="left:50%;z-index:90;">
<span doo="true" class="Kung foo" style="left:90%;z-index:95;">
</div>
</td>
</tr>
<tr class=" " somethingc1="" somethingc2="" somethingc3="" data-something="1" something="1something4" something_id="6something7">
<td class="text-center td_something">
<div>
<span doo="true" class="Kung foo" style="left:35%;z-index:95;">
</div>
</td>
</tr>
<tr class=" " somethingc1="" somethingc2="" somethingc3="" data-something="1" something="1something4" something_id="6something7">
<td class="text-center td_something">
<div>
<span doo="true" class="foo" style="left:99%;z-index:100;">
</div>
</td>
</tr>
How may I make a list using Bs4 in Python to find the highest value of 'left' in 'style' attrs keeping in mind that I do not want to take into consideration spans with class_ "Kung"
Desired result would be:
[70,False or NaN,99]
I've got it I should start with something like:
trs = soup.find_all('tr', attrs={"data-something": "1"})
List = list()
find_all('span',{'style': re.compile(r'^left:.')})
>>> import bs4
>>> HTML = open('temp.htm').read()
>>> soup = bs4.BeautifulSoup(HTML, 'lxml')
First, select all of the element whose class
contains foo
(whether or not it contains something else as well).
>>> elements = soup.select('.foo')
In each case element['class']
will be a list of the items in class
for the element, ie, either just foo
or foo
and Kung
in the case of this HTML. Thus a test for the length of element['class']
is a test for the presence of foo
alone.
element['style']
gets the contents of style
for the element. Use a regex for the part of it we want, and add it to the list called lefts
.
>>> lefts = [ ]
>>> for element in elements:
... if len(element['class'])==1:
... lefts.append(int(bs4.re.search(r'left:([0-9]+)', element['style']).groups(0)[0]))
...
>>>
>>> lefts
[70, 50, 99]
Edit:
Find the tr
elements, then look for the elements
with class foo
. As before, include consideration of only those elements with just class foo
not both foo
and Kung
. Gather left
style elements for these elements and then find the maximum values of them.
>>> HTML = open('temp.htm').read()
>>> import bs4
>>> soup = bs4.BeautifulSoup(HTML, 'lxml')
>>> trs = soup.findAll('tr')
>>> tr_max = []
>>> for tr in trs:
... elements = tr.select('.foo')
... lefts = [ ]
... for element in elements:
... if len(element['class'])==1:
... lefts.append(int(bs4.re.search(r'left:([0-9]+)', element['style']).groups(0)[0]))
... if lefts:
... tr_max.append(max(lefts))
... else:
... tr_max.append(None)
...
>>> tr_max
[70, None, 99]