I have HTML code like this
<div>
Foo <span>Bar</span><br />
Baz<br />
<b>Foobar</b> Quux
</div>
Now I'd like to process the nodes separated by <br />
tags like this:
nodes = sel.xpath("???")
my_foo = nodes[0] # contains Foo <span>Bar</span>
my_bar = nodes[1] # contains Bar
my_fb = nodes[2] # contains <b>Foobar</b> Quux
Is there some XPath or CSS expression that will do this or do I have to iterate over all child nodes of <div>
, building an array in the process for each node that is not a <br>
?
The closest I can think of is this:
[sel.xpath('''.//div/node()[count(preceding-sibling::br)=%d]
[not(self::br)]''' % i).extract()
for i in range(0, len(sel.xpath('.//div/br'))+1)]
which gives you:
[[u'\n Foo ', u'<span>Bar</span>'],
[u'\n Baz'],
[u'\n ', u'<b>Foobar</b>', u' Quux\n']]
which gives you lists of node between the <br/>
elements under <div>
(counting the <br>
s and looking for nodes that have <br>
s before (none, then 1, then 2))