Search code examples
javascriptrubyregexopen-uriruby-2.1

Parse data from JavaScript of retrieved page


I'm retrieving a web page with OpenURI:

require 'open-uri'
page = open('http://www.example.com').read.scrub

Now I'd like to parse the values of the attributes playerurl, playerdata and pageurl of the retrieved page. They appear in a <script> tag:

<script>
..
..
  PlayerWatchdog.init({
      'playerurl': 'http://cdn.static.de/now/player.swf?ts=2011354353',
      'playerdata': 'http://www.example.com/player',
      'pageurl': 'http://www.example.com?test=2',
      });
..
..
</script>

What's the smartest way to accomplish this?


Solution

  • Ruby has no built-in javascript parsing capabilities. You can use a regexp, though this will be rather sensitive to the formatting of the page (for example this will break if the page starts using double quotes for strings):

    playerurl = page[/'playerurl':\s*'([^']*)'/, 1]