I am using an API that returns a dictionary with a nested list inside, lets name it coins_best
The result looks like this:
{'bitcoin': [[1603782192402, 13089.646908288987],
[1603865643028, 13712.070136258053]],
'ethereum': [[1603782053064, 393.6741989091851],
[1603865024078, 404.86117057956386]]}
The first value on the list is a timestamp, while the second is a price in dollars. I want to create a DataFrame with the prices and having the timestamps as index. I tried with this code to do it in just one step:
d = pd.DataFrame()
for id, obj in coins_best.items():
for i in range(0,len(obj)):
temp = pd.DataFrame({
obj[i][1]
}
)
d = pd.concat([d, temp])
d
This attempt gave me a DataFrame with just one column and not the two required, because using the columns
argument threw errors (TypeError: Index(...) must be called with a collection of some kind, 'bitcoin' was passed) when I tried with id
Then I tried with comprehensions to preprocess the dictionary and their lists:
for k in coins_best.keys():
inner_lists = (coins_best[k] for inner_dict in coins_best.values())
items = (item[1] for ls in inner_lists for item in ls)
I could not obtain the both elements in the dictionary, just the last.
I know is possible to try:
df = pd.DataFrame(coins_best, columns=coins_best.keys())
Which gives me:
bitcoin ethereum
0 [1603782192402, 13089.646908288987] [1603782053064, 393.6741989091851]
1 [1603785693143, 13146.275972229188] [1603785731599, 394.6174435303511]
And then try to remove the first element in every list of every row, but was even harder to me. The required answer is:
bitcoin ethereum
1603782192402 13089.646908288987 393.6741989091851
1603785693143 13146.275972229188 394.6174435303511
Do you know how to process the dictionary before creating the DataFrame in order the get this result?
Is my first question, I tried to be as clear as possible. Thank you very much.
Update #1
The answer by Sander van den Oord also solved the problem of timestamps and is useful for its purpose. However, the sample code while correct (as it used the info provided) was limited to these two keys. This is the final code that solved the problem for every key in the dictionary.
for k in coins_best:
df_coins1 = pd.DataFrame(data=coins_best[k], columns=['timestamp', k])
df_coins1['timestamp'] = pd.to_datetime(df_coins1['timestamp'], unit='ms')
df_coins = pd.concat([df_coins1, df_coins], sort=False)
df_coins_resampled = df_coins.set_index('timestamp').resample('d').mean()
Thank you very much for your answers.
I think you shouldn't ignore the fact that values of coins are taken at different times. You could do something like this:
import pandas as pd
import hvplot.pandas
coins_best = {
'bitcoin': [[1603782192402, 13089.646908288987],
[1603865643028, 13712.070136258053]],
'ethereum': [[1603782053064, 393.6741989091851],
[1603865024078, 404.86117057956386]],
}
df_bitcoin = pd.DataFrame(data=coins_best['bitcoin'], columns=['timestamp', 'bitcoin'])
df_bitcoin['timestamp'] = pd.to_datetime(df_bitcoin['timestamp'], unit='ms')
df_ethereum = pd.DataFrame(data=coins_best['ethereum'], columns=['timestamp', 'ethereum'])
df_ethereum['timestamp'] = pd.to_datetime(df_ethereum['timestamp'], unit='ms')
df_coins = pd.concat([df_ethereum, df_bitcoin], sort=False)
Your df_coins
will now look like this:
+----+----------------------------+------------+-----------+
| | timestamp | ethereum | bitcoin |
|----+----------------------------+------------+-----------|
| 0 | 2020-10-27 07:00:53.064000 | 393.674 | nan |
| 1 | 2020-10-28 06:03:44.078000 | 404.861 | nan |
| 0 | 2020-10-27 07:03:12.402000 | nan | 13089.6 |
| 1 | 2020-10-28 06:14:03.028000 | nan | 13712.1 |
+----+----------------------------+------------+-----------+
Now if you want values to be on the same line, you could use resampling, here I do it per day: all values of the same day for a coin type are averaged:
df_coins_resampled = df_coins.set_index('timestamp').resample('d').mean()
df_coins_resampled
will look like this:
+---------------------+------------+-----------+
| timestamp | ethereum | bitcoin |
|---------------------+------------+-----------|
| 2020-10-27 00:00:00 | 393.674 | 13089.6 |
| 2020-10-28 00:00:00 | 404.861 | 13712.1 |
+---------------------+------------+-----------+
I like to use hvplot to get an interactive plot of the result:
df_coins_resampled.hvplot.scatter(
x='timestamp',
y=['bitcoin', 'ethereum'],
s=20, padding=0.1
)