I've a amazon price data for around 8.5k products from a period of Feb. 1, 2015 to Oct. 31, 2015. Currently, it is in the form of a dictionary with key as number of days from base date and value as the new price starting that day. Example, here the price is $10 from day 1 and changes to $15 on 45th day, and then changes to $9 on 173rd day and doesn't change after that.
{1:10,
45:15,
.
.
.
173:9}
What is the best way to store such a timeseries for easy manipulation using python? I would like to perform a lot of aggregations and also would be querying a price on a particular date. Lastly, I would be performing some fixed effect regressions and am confused what would be best way to store this timeseries, so that my programming job becomes comparatively simpler. I could possibly store as table with 273 columns (each for a day) and rows corresponding to 8.5k products. I've been looking at pandas module which can help me do this, but is there a better way? Thanks!
You could use a dict of dicts and convert that into a pandas dataframe and also use numpy to do calculations. Your first key would be product and the inner dict would be the one you already have but it won't print in the format you suggested but all you would need to do is transpose it so for a quick example
import pandas as pd
d = {'Product1': {1:10, 45:15, 173:9}, 'Product2': {1:11, 100:50, 173:10}}
df = pd.DataFrame(d).T
print df
1 45 100 173
Product1 10 15 NaN 9
Product2 11 NaN 50 10