Search code examples
rubyhpricot

Hpricot parse image alt text


I am trying to pull out the alt text from all images on a page using Hpricot but can't figure out how to do it.

Has anyone done this before?

Thanks! Dennis


Solution

  • This is my first time using Hpricot so be gentle. I think this isolates the data you were asking about.

    require 'rubygems'
    require 'hpricot'
    
    page = "<html><body><p>Create a link of an image:<a href=\"default.asp\"><img src=\"smiley.gif\" alt=\"alt_text_1\" width=\"32\" height=\"32\" /></a></p><p>No border around the image, but still a link:<a href=\"default.asp\"><img border=\"0\" src=\"smiley.gif\" alt=\"alt_text_2\" width=\"32\" height=\"32\" /></a></p></body></html>"
    doc = Hpricot(page)
    
    doc.search("//img").each do |img|
        puts img.attributes['alt']
    end
    

    Output looks like this:

    #=> alt_text_1
    #=> alt_text_2