Search code examples
pythonpandasdataframedictionarypandas-groupby

Changing list to dataframe in dictionary


I am writing a dictionary that has to seperate a dataframe into multiple small dataframes based on a certain item that is repeated in the list calvo_massflows. If the items isn't repeated, it'll make a list in the dictionary. In the second for loop, the dictionary will add the index item from the df dataframe to one of the dictionary lists, if the key (l) and e are the same.

This is what I currently got:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from scipy.stats import linregress
from scipy.optimize import curve_fit

calvo_massflow = [1, 2, 1, 2, 2, 1, 1]

df = pd.DataFrame({"a":[1, 2, 3, 4, 11, 2, 4, 6, 7, 3],
                   "b":[5, 6, 7, 8, 10, 44, 23, 267, 4, 66]})


dic = {}
massflows = []
for i, e in enumerate(calvo_massflow):
    if e not in massflows:
        massflows.append(e)
        dic[e] = []
    for l in dic:
        if e == l:
            dic[e].append(pd.DataFrame([df.iloc[i]]))

The problem with the output is the fact each index is a seperate dataframe in thte dictionary. I would like to have all the dataframes combined. I tried doing something with pd.concat. But I didn't figure it out. Moreover, the chapters in the dictionary (if that's how you call them), are lists and I prefer them being dataframes. However, if I change my list to a dataframe like I done here:

dic3 = {}
massflows = []
for i, e in enumerate(calvo_massflow):
    if e not in massflows:
        massflows.append(e)
        dic3[e] = pd.DataFrame([])
    for l in dic3:
        if e == l:
            dic3[e].append(df.iloc[i])

I can't seem to add dataframes to the dataframes made by the dictionary.

My ideal scenario would be a dictionary with two dataframes. One having the key '1' and one being '2'. Both those dataframes, include all the information from the data frame df. And not how it is right now with separate dataframes for each index. Preferably the dataframes aren't in lists like they are now but it won't be a disaster.

Let me know if you guys can help me out or need more context!


Solution

  • IIUC you want to select the rows of df up to the length of calvo_massflow, group by calvo_massflow and convert to dict. This might look like this:

    calvo_massflow = [1, 2, 1, 2, 2, 1, 1]
    
    df = pd.DataFrame({"a":[1, 2, 3, 4, 11, 2, 4, 6, 7, 3],
                       "b":[5, 6, 7, 8, 10, 44, 23, 267, 4, 66]})
    
    dic = dict(iter(df.iloc[:len(calvo_massflow)]
                .groupby(calvo_massflow)))
    print(dic)
    

    resulting in a dictionary with keys 1 and 2 containing two filtered DataFrames:

    {1:    a   b
     0  1   5
     2  3   7
     5  2  44
     6  4  23,
     2:     a   b
     1   2   6
     3   4   8
     4  11  10}