I'm currently working on a personal small machine learning project to predict action figure prices, the first thing I need is data obviously.
I want to collect the price history data from here: https://www.actionfigure411.com/transformers/war-for-cybertron-siege-series/deluxe-class/ironhide-3664.php
Have been stuck on this for a couple hours, I appreciate for any advices.
By the way, this is just one item, wonder if there's a way to do multiple items efficiently.
Would be also appreciated for any suggestions about handling the data, like features, target variable, etc.
Tried figuring out how to collect graph data, didn't work out.
The data is embedded in the page inside <script>
tag. To parse it you can use following example:
import re
import json
import requests
url = "https://www.actionfigure411.com/transformers/war-for-cybertron-siege-series/deluxe-class/ironhide-3664.php"
html_doc = requests.get(url).text
data = re.search(r"jsonData = ({.*?})\n\s*\n", html_doc, flags=re.S).group(1)
data = json.loads(data)
for row in data['rows']:
print('{:<20} ${:<10}'.format(row['c'][0]['v'], row['c'][1]['v']))
Prints:
Date(2021, 10) $71
Date(2021, 11) $87
Date(2022, 0) $90
Date(2022, 1) $89
Date(2022, 2) $82
Date(2022, 5) $58
Date(2022, 8) $50
Date(2022, 9) $29
Date(2022, 10) $57
Date(2022, 11) $42
Date(2023, 0) $42
Date(2023, 1) $100