I have the following HTML:
<ul class="filtering_new" width="50%">
<li class="filter">1</li>
<li class="filter">2</li>
<script>Alert('1');</script>
<li class="filter">3</li>
</ul>
How can I get li
with inner_html = 3
?
I tried like this:
page.search("//ul.filtering_new").each do |list|
puts list.search("li").size
end
where page
is the HTML document.
size = 2
, but it should be 3.
I tried to do like in manual https://github.com/hpricot/hpricot/wiki/hpricot-challenge
but I cannot even find <script
.
list.search("script")
returns nothing.
Most XML/HTML parsing in Ruby uses Nokogiri these days, so I'll recommend that parser. However, both Hpricot and Nokogiri support XPath and CSS, so they are fairly interchangeable.
I'd go about it this way:
html = <<EOT
<ul class="filtering_new" width="50%">
<li class="filter">1</li>
<li class="filter">2</li>
<script>Alert('1');</script>
<li class="filter">3</li>
</ul>
EOT
require 'nokogiri'
doc = Nokogiri::HTML(html)
li = doc.search('//li[@class="filter"]').select{ |n| n.text.to_i == 3 }
li # => [#<Nokogiri::XML::Element:0x8053fc84 name="li" attributes=[#<Nokogiri::XML::Attr:0x8053fb6c name="class" value="filter">] children=[#<Nokogiri::XML::Text:0x80546f98 "3">]>]
That finds the candidate nodes, then returns them as a NodeSet to be iterated over, where they are selected/rejected based on the node's text.
li = doc.search('//li[text() = "3"]')
li # => [#<Nokogiri::XML::Element:0x8053fc84 name="li" attributes=[#<Nokogiri::XML::Attr:0x8053fb6c name="class" value="filter">] children=[#<Nokogiri::XML::Text:0x80546f98 "3">]>]
That offloads more of the comparison to the underlying libXML library, where it runs a lot faster.