Search code examples
rubyregexhpricot

Searching Hpricot with Regex


I'm trying to use Hpricot to get the value within a span with a class name I don't know. I know that it follows the pattern "foo_[several digits]_bar".

Right now, I'm getting the entire containing element as a string and using a regex to parse the string for the tag. That solution works, but it seems really ugly.

doc = Hpricot(open("http://scrape.example.com/search?q=#{ticker_symbol}"))
elements = doc.search("//span[@class='pr']").inner_html
string = ""
elements.each do |attr|
  if(attr =~ /foo_\d+_bar/)
    string = attr
  end
end
# get rid of the span tags, just get the value
string.sub!(/<\/span>/, "")
string.sub!(/<span.+>/, "")

return string

It seem like there should be a better way to do that. I'd like to do something like:

elements = doc.search("//span[@class='" + /foo_\d+_bar/ + "']").inner_html

But that doesn't run. Is there a way to search with a regular expression?


Solution

  • This should do:

    doc.search("span[@class^='foo'][@class$='bar']")