Search code examples
pythondataframemachine-learningweb-scrapingdata-science

How to web scraping graph data?


I'm currently working on a personal small machine learning project to predict action figure prices, the first thing I need is data obviously.

I want to collect the price history data from here: https://www.actionfigure411.com/transformers/war-for-cybertron-siege-series/deluxe-class/ironhide-3664.php

Have been stuck on this for a couple hours, I appreciate for any advices.

By the way, this is just one item, wonder if there's a way to do multiple items efficiently.

Would be also appreciated for any suggestions about handling the data, like features, target variable, etc.

Tried figuring out how to collect graph data, didn't work out.


Solution

  • The data is embedded in the page inside <script> tag. To parse it you can use following example:

    import re
    import json
    import requests
    
    url = "https://www.actionfigure411.com/transformers/war-for-cybertron-siege-series/deluxe-class/ironhide-3664.php"
    
    html_doc = requests.get(url).text
    data = re.search(r"jsonData = ({.*?})\n\s*\n", html_doc, flags=re.S).group(1)
    data = json.loads(data)
    
    for row in data['rows']:
        print('{:<20} ${:<10}'.format(row['c'][0]['v'], row['c'][1]['v']))
    

    Prints:

    Date(2021, 10)       $71        
    Date(2021, 11)       $87        
    Date(2022, 0)        $90        
    Date(2022, 1)        $89        
    Date(2022, 2)        $82        
    Date(2022, 5)        $58        
    Date(2022, 8)        $50        
    Date(2022, 9)        $29        
    Date(2022, 10)       $57        
    Date(2022, 11)       $42        
    Date(2023, 0)        $42        
    Date(2023, 1)        $100