Search code examples
pythondataframedictionarylabel

how can I apply label encoding by using a dataframe without using labelencoder etc. in Python?


I have two data frames: first:

l1=[123,345,546,245]
l2=["a","a","b","b"]
l3=["j","k","l","pp"]
l4=["m","n","m","n"]

df1=pd.DataFrame(list(zip(l1,l2,l3,l4)),columns=['id','X','Y','Z'])
df1.head()

enter image description here

second:

l1=["X","X","Y","Y","Y","Y","Z","Z"]
l2=["a","b","j","k","l","pp","m","n"]
l3=["1","2","1","2","3","4","1","2"]

df2=pd.DataFrame(list(zip(l1,l2,l3)),columns=["labelnames","levels","labels"])
df2

enter image description here

I need to do label encoding to the first data frame values by using the second data frame. I tried converting the second data frame to a dictionary and then replacing the first data frame values.

such as;

 dict= {"X" : { a:1 , b:2}
     "Y": {j:1,k:2,l:3, pp:4},
      "Z" : {m:1,n:2}}

I couldn't print the proper dictionary format.

I want to see the output like:

l1=[123,345,546,245]
l2=["1","1","2","2"]
l3=["1","2","3","4"]
l4=["1","2","1","2"]

df1=pd.DataFrame(list(zip(l1,l2,l3,l4)),columns=['id','X','Y','Z'])
df1.head()

enter image description here

So, How can I replace the first data frame values( X, Y, Z) with second data frame labels?


Solution

  • here is one way to do it, using melt, and pivot

    # melt the dataframe to make it columner
    # merge with the reference dataframe
    # finally pivot to get back to the original format
    (df.melt(id_vars=['id'], var_name='key' ).merge(df2,
                                                    left_on=['key','value'],
                                                    right_on=['labelnames','levels'],
                                                    how='left'
                                                   ) 
     .pivot(index='id', columns='key', values='labels')
     .reset_index()
     .rename_axis(columns=None)
    )
    
    
             id     X   Y   Z
        0   123     1   1   1
        1   245     2   4   2
        2   345     1   2   2
        3   546     2   3   1