Search code examples
pythonpython-3.xpandasdataframeepoch

How to match and merge two dataframes having completely different values except numericals in columns of dataframe?


have a dataframe ABC of value

      id         |     price                          |   type
0     easdca     | Rs.1,599.00 was trasn by you       | unknown
1     vbbngy     | txn of INR 191.00 using            | unknown
2     awerfa     | Rs.190.78 credits was used by you  | unknown
3     zxcmo5     | DLR.2000 credits was used by you   | unknown

and other XYZ of value

         price          |   type
0      190.78           | food
1      191.00           | movie
2      2,000            | football
3      1,599.00         | basketball

how to map XYZ with ABC ,so that type in ABC get updated with type in xyz using values(numericals) in price of XYZ .

output i need

       id         |     price                          |   type
0     easdca     | Rs.1,599.00 was trasn by you        | basketball
1     vbbngy     | txn of INR 191.00 using             | movie
2     awerfa     | Rs.190.78 credits was used by you   | food
3     zxcmo5     | DLR.2,000 credits was used by you| football

used this

d = dict(zip(XYZ['PRICE'],XYZ['TYPE']))

pat = (r'({})'.format('|'.join(d.keys())))

ABC['TYPE']=ABC['PRICE'].str.extract(pat,expand=False).map(d)

But values likes 190.78 and 191.00 are getting mismatched. for example while working with huge data 190.78 should be matched with food values like 190.77 gets mismatched with food where it has other value assigned to it. And 198.78 also gets mismatched with some other ones where it should match with food


Solution

  • df

            id                price                                type
    0       easdca        Rs.1,599.00 was trasn by you          unknown
    1       vbbngy        txn of INR 191.00 using               unknown
    2       awerfa        Rs.190.78 credits was used by you     unknown
    3       zxcmo5        DLR.2000 credits was used by you      unknown
    

    df2

               price                   type
    0        190.78                    food
    1        191.00                   movie
    2        2,000                 football
    3        1,599.00            basketball
    

    using re

    df['price_'] = df['price'].apply(lambda x: re.findall(r'(?<=[\.\s])[\d\.]+',x.replace(',',''))[0])
    df2.columns = ['price_','type']
    df2['price_'] = df2['price_'].str.repalce(',','')
    

    Changing the types to float

    df2['price_']  = df2['price_'].astype(float)
    df['price_']  = df['price_'] .astype(float)
    

    Using pd.merge

    df = df.merge(df2, on='price_')
    df.drop('type_x', axis=1)
    

    Output

                    id                                 price   price_       type_y
    0      easdca        Rs.1,599.00 was trasn by you         1599.00   basketball
    1      vbbngy        txn of INR 191.00 using               191.00        movie
    2      awerfa        Rs.190.78 credits was used by you     190.78         food
    3      zxcmo5        DLR.2000 credits was used by you        2000     football