Search code examples
pythondictionarypandasdefaultdict

How to copy unique keys and values from another dictionary in Python


I have a dataframe df with transactions where the values in the column Col can be repeated. I use Counter dictionary1 to count the frequency for each Col value, then I would like to run a for loop on a subset of the data and obtain a value pit. I want to create a new dictionary dict1 where the key is the key from dictionary1 and the value is the value of pit. This is the code I have so far:

dictionary1 = Counter(df['Col'])
dict1 = defaultdict(int)

for i in range(len(dictionary1)):       
    temp = df[df['Col'] == dictionary1.keys()[i]]
    b = temp['IsBuy'].sum()
    n = temp['IsBuy'].count()
    pit = b/n
    dict1[dictionary1.keys()[i]] = pit

My question is, how can i assign the key and value for dict1 based on the key of dictionary1 and the value obtained from the calculation of pit. In other words, what is the correct way to write the last line of code in the above script.

Thank you.


Solution

  • Since you're using pandas, I should point out that the problem you're facing is common enough that there's a built-in way to do it. We call collecting "similar" data into groups and then performing operations on them a groupby operation. It's probably wortwhile reading the tutorial section on the groupby split-apply-combine idiom -- there are lots of neat things you can do!

    The pandorable way to compute the pit values would be something like

    df.groupby("Col")["IsBuy"].mean()
    

    For example:

    >>> # make dummy data
    >>> N = 10**4
    >>> df = pd.DataFrame({"Col": np.random.randint(1, 10, N), "IsBuy": np.random.choice([True, False], N)})
    >>> df.head()
       Col  IsBuy
    0    3  False
    1    6   True
    2    6   True
    3    1   True
    4    5   True
    >>> df.groupby("Col")["IsBuy"].mean()
    Col
    1      0.511709
    2      0.495697
    3      0.489796
    4      0.510658
    5      0.507491
    6      0.513183
    7      0.522936
    8      0.488688
    9      0.490498
    Name: IsBuy, dtype: float64
    

    which you could turn into a dictionary from a Series if you insisted:

    >>> df.groupby("Col")["IsBuy"].mean().to_dict()
    {1: 0.51170858629661753, 2: 0.49569707401032703, 3: 0.48979591836734693, 4: 0.51065801668211308, 5: 0.50749063670411987, 6: 0.51318267419962338, 7: 0.52293577981651373, 8: 0.48868778280542985, 9: 0.49049773755656106}