Search code examples
pythondataframedictionarydictionary-comprehension

Converting a dictionary of lists to a pandas.DataFrame using predefined headers


I have a dictionary that looks like the following:

date_pair_dict = {

    "15-02-2022 15-02-2022": ["key 1 val 1", "key 1 val 2", "key 1 val 3"],
    "15-02-2022 16-02-2022": ["key 2 val 1", "key 2 val 2", "key 2 val 3"],
    "16-02-2022 16-02-2022": ["key 3 val 1", "key 3 val 2", "key 3 val 3"],
    "16-02-2022 17-02-2022": ["key 4 val 1", "key 4 val 2", "key 4 val 3"]

}

And a list of headers:

headers = ["date pair header", "header val 1", "header val 2", "header val 3"]

I would like to create a pandas.DataFrame and write this to Excel, where the format would be the following expected output:

date pair header header val 1 header val 2 header val 3
15-02-2022 15-02-2022 key 1 val 1 key 1 val 2 key 1 val 3
15-02-2022 16-02-2022 key 2 val 1 key 2 val 2 key 2 val 3
16-02-2022 16-02-2022 key 3 val 1 key 3 val 2 key 3 val 3
16-02-2022 17-02-2022 key 4 val 1 key 4 val 2 key 4 val 3

Right now, I'm using this (arguably very sad) method:

import pandas

date_pair_dict = {

    "15-02-2022 15-02-2022": ["key 1 val 1", "key 1 val 2", "key 1 val 3"],
    "15-02-2022 16-02-2022": ["key 2 val 1", "key 2 val 2", "key 2 val 3"],
    "16-02-2022 16-02-2022": ["key 3 val 1", "key 3 val 2", "key 3 val 3"],
    "16-02-2022 17-02-2022": ["key 4 val 1", "key 4 val 2", "key 4 val 3"]

}

headers = ["date pair header", "header val 1", "header val 2", "header val 3"]

list_of_keys, list_of_val_1, list_of_val_2, list_of_val_3 = [], [], [], []

for key in date_pair_dict.keys():

    list_of_keys.append(key)

    val_1, val_2, val_3 = date_pair_dict.get(key)

    list_of_val_1.append(val_1)
    list_of_val_2.append(val_2)
    list_of_val_3.append(val_3)

dataframe = pandas.DataFrame(
    {
        headers[0]: list_of_keys,
        headers[1]: list_of_val_1,
        headers[2]: list_of_val_2,
        headers[3]: list_of_val_3,
    }
)

Which is not scalable whatsoever. In reality, this date_pair_dict can have any number of keys, each corresponding to a list of any length. The length of these lists will however always remain the same, and will be known beforehand (I will always predefine the headers list).

Additionally, I believe this runs the risk of me having a dataframe that does not share the same order as the original keys, due to me doing the following:

for key in dictionary.keys():

    ....

The keys are date pairs, and need to remain in order when used as the first column of the dataframe.

Is there a better way to do this, preferably using a dictionary comprehension?


Solution

  • Like you said you can use a comprehension on your dict key/value pairs:

    import pandas as pd
    
    date_pair_dict = {
    
        "15-02-2022 15-02-2022": ["key 1 val 1", "key 1 val 2", "key 1 val 3"],
        "15-02-2022 16-02-2022": ["key 2 val 1", "key 2 val 2", "key 2 val 3"],
        "16-02-2022 16-02-2022": ["key 3 val 1", "key 3 val 2", "key 3 val 3"],
        "16-02-2022 17-02-2022": ["key 4 val 1", "key 4 val 2", "key 4 val 3"]
    
    }
    
    headers = ["date pair header", "header val 1", "header val 2", "header val 3"]
    
    df = pd.DataFrame([[k] + v for k,v in date_pair_dict.items()], columns=headers)
    print(df)
    

    Output:

            date pair header header val 1 header val 2 header val 3
    0  15-02-2022 15-02-2022  key 1 val 1  key 1 val 2  key 1 val 3
    1  15-02-2022 16-02-2022  key 2 val 1  key 2 val 2  key 2 val 3
    2  16-02-2022 16-02-2022  key 3 val 1  key 3 val 2  key 3 val 3
    3  16-02-2022 17-02-2022  key 4 val 1  key 4 val 2  key 4 val 3