Search code examples
python-2.7pandasdictionarydataframestartswith

Create new column in dataframe by applying math operation to column values based on a match


I have the following dataframes:

df1
name  phone  duration(m)
Luisa 443442  1
Jack  442334  6
Matt  442212  2
Jenny 453224  1

df2
prefix charge rate
443     0.8   0.3
446     0.8   0.4
442     0.6   0.1
476     0.8   0.3

my desired output is to match each phone number with its prefix (there are more prefixes than phone numbers) and calculate how much to charge per called by multiplying the duration of call for each phone number by the corresponding prefix charge plus the corresponding rate.

output ex.

df1
    name  phone  duration(m) bill
    Luisa 443442  1          (example: 1x0.3+0.8)
    Jack  442334  6          (example: 6x0.1+0.6)
    Matt  442212  2
    Jenny 453224  1

my idea was to convert df2 to a dictionary like so dict={'443':[0.3,0.8],'442':[0.1,0.6]...} so i could match each number with the dict key and then do the opertion with the corresponding value of that matching key. However is not working and would also like to know if there is a better alternative.


Solution

  • df1 = pd.DataFrame({'name':["Louisa","Jack","Matt","Jenny"],'phone':[443442,442334,442212,453224],'duration':[1,6,2,1]})
    df2 = pd.DataFrame({'prefix':[443,446,442,476],'charge':[0.8,0.8,0.6,0.8],'rate':[0.3,0.4,0.1,0.3]})
    
    df3=pd.concat((df1,df2),axis=1)
    
    df4=pd.DataFrame({"phone_pref":df3["phone"].astype(str).str[:3]})
    df4=df4["phone_pref"].drop_duplicates()
    
    df3["bill"]=None
    for j in range(len(df4)):
        for i in range(len(df3["prefix"])):
            if df3.loc[i,"prefix"]==int(df4.iloc[j]):
                df3.loc[i,"bill"]=df3.loc[i,"duration"]*df3.loc[i,"charge"]+df3.loc[i,"rate"]
    print(df3)
    
       duration    name   phone  charge  prefix  rate  bill
    0         1  Louisa  443442     0.8     443   0.3   1.1
    1         6    Jack  442334     0.8     446   0.4  None
    2         2    Matt  442212     0.6     442   0.1   1.3
    3         1   Jenny  453224     0.8     476   0.3  None
    

    The None values in the bill column are because in your excample no phone number has the prefixes 446 or 476 and thus they are not in the df4... Also the bill is calculated with the formula of yours given in the question