Search code examples
pythonpandasdataframelist-comprehensiondictionary-comprehension

Get just one value of a nested list inside a dictionary to create a Dataframe Update #1


I am using an API that returns a dictionary with a nested list inside, lets name it coins_best The result looks like this:

{'bitcoin': [[1603782192402, 13089.646908288987],
  [1603865643028, 13712.070136258053]],
 'ethereum': [[1603782053064, 393.6741989091851],
  [1603865024078, 404.86117057956386]]}

The first value on the list is a timestamp, while the second is a price in dollars. I want to create a DataFrame with the prices and having the timestamps as index. I tried with this code to do it in just one step:

d = pd.DataFrame()
for id, obj in coins_best.items():
    for i in range(0,len(obj)):
        temp = pd.DataFrame({
            obj[i][1]
            }
        )
        d = pd.concat([d, temp])
d

This attempt gave me a DataFrame with just one column and not the two required, because using the columns argument threw errors (TypeError: Index(...) must be called with a collection of some kind, 'bitcoin' was passed) when I tried with id

Then I tried with comprehensions to preprocess the dictionary and their lists:

for k in coins_best.keys():
    inner_lists = (coins_best[k] for inner_dict in coins_best.values())

items = (item[1] for ls in inner_lists for item in ls)

I could not obtain the both elements in the dictionary, just the last.

I know is possible to try:

df = pd.DataFrame(coins_best, columns=coins_best.keys())

Which gives me:

                 bitcoin                              ethereum
0   [1603782192402, 13089.646908288987] [1603782053064, 393.6741989091851]
1   [1603785693143, 13146.275972229188] [1603785731599, 394.6174435303511]

And then try to remove the first element in every list of every row, but was even harder to me. The required answer is:

                      bitcoin            ethereum
1603782192402   13089.646908288987   393.6741989091851
1603785693143   13146.275972229188   394.6174435303511

Do you know how to process the dictionary before creating the DataFrame in order the get this result?

Is my first question, I tried to be as clear as possible. Thank you very much.

Update #1

The answer by Sander van den Oord also solved the problem of timestamps and is useful for its purpose. However, the sample code while correct (as it used the info provided) was limited to these two keys. This is the final code that solved the problem for every key in the dictionary.

for k in coins_best:
    df_coins1 = pd.DataFrame(data=coins_best[k], columns=['timestamp', k])
    df_coins1['timestamp'] = pd.to_datetime(df_coins1['timestamp'], unit='ms')
    df_coins = pd.concat([df_coins1, df_coins], sort=False)

df_coins_resampled = df_coins.set_index('timestamp').resample('d').mean()

Thank you very much for your answers.


Solution

  • I think you shouldn't ignore the fact that values of coins are taken at different times. You could do something like this:

    import pandas as pd
    import hvplot.pandas
    
    coins_best = {
        'bitcoin': [[1603782192402, 13089.646908288987],
                   [1603865643028, 13712.070136258053]],
       'ethereum': [[1603782053064, 393.6741989091851],
                   [1603865024078, 404.86117057956386]],
    }
    
    df_bitcoin = pd.DataFrame(data=coins_best['bitcoin'], columns=['timestamp', 'bitcoin'])
    df_bitcoin['timestamp'] = pd.to_datetime(df_bitcoin['timestamp'], unit='ms')
    
    
    df_ethereum = pd.DataFrame(data=coins_best['ethereum'], columns=['timestamp', 'ethereum'])
    df_ethereum['timestamp'] = pd.to_datetime(df_ethereum['timestamp'], unit='ms')
    
    df_coins = pd.concat([df_ethereum, df_bitcoin], sort=False)
    

    Your df_coins will now look like this:

    +----+----------------------------+------------+-----------+
    |    | timestamp                  |   ethereum |   bitcoin |
    |----+----------------------------+------------+-----------|
    |  0 | 2020-10-27 07:00:53.064000 |    393.674 |     nan   |
    |  1 | 2020-10-28 06:03:44.078000 |    404.861 |     nan   |
    |  0 | 2020-10-27 07:03:12.402000 |    nan     |   13089.6 |
    |  1 | 2020-10-28 06:14:03.028000 |    nan     |   13712.1 |
    +----+----------------------------+------------+-----------+
    

    Now if you want values to be on the same line, you could use resampling, here I do it per day: all values of the same day for a coin type are averaged:

    df_coins_resampled = df_coins.set_index('timestamp').resample('d').mean()
    

    df_coins_resampled will look like this:

    +---------------------+------------+-----------+
    | timestamp           |   ethereum |   bitcoin |
    |---------------------+------------+-----------|
    | 2020-10-27 00:00:00 |    393.674 |   13089.6 |
    | 2020-10-28 00:00:00 |    404.861 |   13712.1 |
    +---------------------+------------+-----------+
    

    I like to use hvplot to get an interactive plot of the result:

    df_coins_resampled.hvplot.scatter(
        x='timestamp', 
        y=['bitcoin', 'ethereum'], 
        s=20, padding=0.1
    )
    

    Resulting plot: plotting bitcoins over time