python dataframe machine-learning web-scraping data-science

How to web scraping graph data?

I'm currently working on a personal small machine learning project to predict action figure prices, the first thing I need is data obviously.

I want to collect the price history data from here: https://www.actionfigure411.com/transformers/war-for-cybertron-siege-series/deluxe-class/ironhide-3664.php

Have been stuck on this for a couple hours, I appreciate for any advices.

By the way, this is just one item, wonder if there's a way to do multiple items efficiently.

Would be also appreciated for any suggestions about handling the data, like features, target variable, etc.

Tried figuring out how to collect graph data, didn't work out.

Solution

The data is embedded in the page inside <script> tag. To parse it you can use following example:

import re
import json
import requests

url = "https://www.actionfigure411.com/transformers/war-for-cybertron-siege-series/deluxe-class/ironhide-3664.php"

html_doc = requests.get(url).text
data = re.search(r"jsonData = ({.*?})\n\s*\n", html_doc, flags=re.S).group(1)
data = json.loads(data)

for row in data['rows']:
    print('{:<20} ${:<10}'.format(row['c'][0]['v'], row['c'][1]['v']))

Prints:

Date(2021, 10)       $71        
Date(2021, 11)       $87        
Date(2022, 0)        $90        
Date(2022, 1)        $89        
Date(2022, 2)        $82        
Date(2022, 5)        $58        
Date(2022, 8)        $50        
Date(2022, 9)        $29        
Date(2022, 10)       $57        
Date(2022, 11)       $42        
Date(2023, 0)        $42        
Date(2023, 1)        $100