Search code examples
pandasdataframedictionaryloggingnormalization

Pandas and Dictionary - Match df column name to a dictionary key, then retrieve the value to do calculation to values of that column


I have two dataframes in 2 separate .csv files the first one being:

             Apple.count.out Orange.count.out Banana.count.out
Total stock   1000              500            1000
average_price   2                2              0.5

I have casted the first one into a dictionary

fruit={'apple':1000,'orange':500,'banana':1000}

df2 consists of a list of store_name containing the stock of individual fruits and as follows:

          apple    orange   banana
store_a    60       25       100
store_b    100      25       200
store_c    200      100      200
store_d    300      300     200        

My task is to use use the value from the dictionary to normalise the stocks in individual store for respective food as the total number of stock across different fruits are different in the first place. I think of matching the dictionary name the retrieve the key and writing a function to loop every column to divide the count in every single store by the total number of count(the value in the dictionary) if the column name matches the key. Turning something into this:

          apple    orange      banana
store_a    0.06      0.05        0.1
store_b    0.10      0.05        0.2
store_c    0.20      0.20        0.2
store_d    0.30      0.60        0.2       

The i need to log10 transform every value. I tried using the following the function:

import numpy as np
import pandas as pd

fruits = ['apple','orange','banana']
df2[fruits] = df2[fruits].apply(lambda x: (x/t) if t = df2.columns.values for t,u in fruit.items())
df2[fruits] = np.log10(df2[fruits])

But the function seems to not working properly and i am not sure whether this function is elegant enough? Can i have some suggestion on how to improve it please?

And as i hard-coded the dictionary for the first df, is there another way of constructing a function casting them into a dictionary? Thank you very much in advance.


Solution

  • You can use:

    fruit = df1.rename(columns=lambda x: x.split('.')[0].lower()).loc['Total stock']
    out = df2 / fruit
    

    Output:

    >>> out
             apple  orange  banana
    store_a   0.06    0.05     0.1
    store_b   0.10    0.05     0.2
    store_c   0.20    0.20     0.2
    store_d   0.30    0.60     0.2
    
    >>> np.log10(out)
                apple    orange   banana
    store_a -1.221849 -1.301030 -1.00000
    store_b -1.000000 -1.301030 -0.69897
    store_c -0.698970 -0.698970 -0.69897
    store_d -0.522879 -0.221849 -0.69897