Search code examples
rubyarrayshpricot

Build array of flashvars using hpricot


I have used hpricot before for grabing content from websites that are within some HTML tags however I am trying to build an array of all the flashvars found on this page http://view-source:http://megavideo.com/?v=014U2YO9

require 'hpricot'
require 'open-uri'

flashvars = Array.new
doc = Hpricot(open("http://megavideo.com/?v=014U2YO9"))

for flashvars in (doc/"/param[@name='flashvars']") do
  flashvars << flashvar
end

I have been trying with the above code snippet, hopefully I was on the right tracks, would anyone be able to help me further?

Thankyou


Solution

  • You have used syntax indicating that you are trying to fetch attributes from <param> elements, but no such markup exists on that page. There are a plethora of JavaScript assignments to properties of a flashvar object. Assuming that these are what you want, you don't need Hpricot, just a regex for the JS. This seems to work:

    require 'open-uri'
    html = open("http://megavideo.com/?v=014U2YO9").read
    
    flashvars = Hash[ html.scan( /flashvars\.(\w+)\s*=\s*["']?(.+?)["']?;/ ) ]
    
    require 'pp' # Just for pretty output here
    pp flashvars
    
    #=> {"logintxt"=>"Login",
    #=>  "registertxt"=>"Register",
    #=>  "searchtxt"=>"Search videos",
    #=>  "searchrestxt"=>"\"",
    #=>  "useSystemFont"=>"0",
    #=>  "size"=>"17",
    #=>  "loginAct"=>"?c=login%26next%3Dv%253D014U2YO9",
    #=>  "registerAct"=>"?c=signup",
    #=>  "userAct"=>"?c=account",
    #=>  "signoutAct"=>"javascript:signout()",
    #=>  "myvideostxt"=>"My Videos",
    #=>  "videosAct"=>"?c=myvideos",
    #=>  "added"=>"2011-04-14",
    #=>  "username"=>"beenerkeekee19952",
    #=>  etc.
    

    Note that this leaves all values as strings in Ruby, even values that were numbers in JavaScript. As it strips off leading/trailing quote marks for the JavaScript strings, the result is that you cannot discern flashvars.foo = 42; from flashvars.bar = "42";.