I have a document that looks like the following:
<ul>
<li>
<a href="/Synergies">Link</a>Content
</li>
</li>
Content <a href="/Synergies">Link</a>
</li>
</ul>
I would like to only obtain the list items that start with an <a>
tag, i.e. the first <li>
would be a hit but the second would not.
I tried getting all list items and regex matching on the html content but it doesn't appear to be working:
list.search('li').each do |item|
if /^<a href="\/Synergies".*$/.match(item)
puts link # hit?
end
end
Any advice would be appreciated!
You can check whether the item's first child is either not text or empty text:
list.search('li').each do |item|
if !item.children.first.text? || item.children.first.text.strip.empty?
puts item # hit?
end
end
If you want to exclude items that don't begin with a link, you can select the first child and check its parents in the condition:
list.search('li > a:first-child').each do |item|
if !item.parent.children.first.text? || item.parent.children.first.text.strip.empty?
puts item # hit?
end
end