Search code examples
javascriptpythoncsvweb-scraping

Scrape website data for CSV


Rather inexperienced with this type of programming effort, much more familiar with embedded systems. I have very little web programming xp.

What I'd like to achieve:

A website (danglefactory.com) has a great table of statistics that I'd like to download into a CSV for processing. On the website, there is a button that calls an internal script to craft a cvs and prepare for download.

Referer http://www.danglefactory.com/projections/skaters/daily

Script http://www.danglefactory.com/scripts/copy_csv_xls.swf

I prefer a python solution, that will be able to fetch this csv either to temp or local storage for processing.

Thanks in adv.


Solution

  • First approach you can take is pretty low-level.

    Under the hood, there are JSON API calls that you can simulate using, for example, requests.

    Here is how you can get the daily projections:

    import requests
    
    url = 'http://www.danglefactory.com/api/DailySkaterProjections?_=1415200157912'
    response = requests.get(url)
    
    data = response.json()
    print data
    

    Prints:

    [{u'A': 0.61,
      u'Blocks': 0.37,
      u'Corsi': 0.53,
      u'FOL': 9.07,
      u'FOW': 8.95,
      u'FOWinPerc': 49.6,
      u'G': 0.39,
      u'Giveaways': 0.89,
      u'Hits': 0.54,
      u'Name': u'John Tavares',
      u'Opponent': u'ANA',
      u'P': 0.99,
      u'PIM': 0.51,
      u'PPA': 0.24,
      u'PPG': 0.11,
      u'PlayerID': 411,
      u'PlusMinus': 0.05,
      u'PrimaryPosition': u'C',
      u'SHA': 0.0,
      u'SHG': 0.0,
      u'ShPerc': 12.6,
      u'Shots': 3.1,
      u'TOI': 20.39,
      u'Takeaways': 0.82,
      u'Team': u'NYI'},
     {u'A': 0.7,
      u'Blocks': 1.0,
      u'Corsi': 0.47,
      u'FOL': 8.69,
      u'FOW': 8.43,
      u'FOWinPerc': 49.3,
      u'G': 0.28,
      u'Giveaways': 0.84,
      u'Hits': 1.49,
      u'Name': u'Ryan Getzlaf',
      u'Opponent': u'NYI',
      u'P': 0.97,
      u'PIM': 0.68,
      u'PPA': 0.22,
      u'PPG': 0.07,
      u'PlayerID': 161,
      u'PlusMinus': 0.06,
      u'PrimaryPosition': u'C',
      u'SHA': 0.04,
      u'SHG': 0.02,
      u'ShPerc': 11.9,
      u'Shots': 2.3,
      u'TOI': 20.52,
      u'Takeaways': 0.61,
      u'Team': u'ANA'},
    
      ...
    
    }]
    

    Then, you can convert the results into csv accordingly using csv module.


    Another solution could be to use selenium browser automation tool, but the problem is that the CSV button and the table is inside a Flash object which selenium cannot interact with.


    You can though use an image recognition and screen automation tool like sikuli to find that CSV button and click on it. This is if you still want to stay on the "high-level".

    Hope that helps.