Search code examples
pythonpandasnumpykeyerror

Python KeyError: pandas: match row value to column name/key where some keys are missing


I have DataFrame which looks something like below:

Q5 | Q10 | Q41 | item 
a  | b   | c   |  Q5  
d  | e   | f   |  Q10  
g  | h   | i   |  Q571    
j  | k   | l   |  Q23340  
m  | n   | o   |  Q41  
h  | p   | s   |  Q10  

Where Q5, Q10, Q41, item are column names of the DataFrame. I want to add one more column "name" which will have value of the column where value of column "item" matched with the column name. So I want it to look like as below:

Q5 | Q10 | Q41 | item    | name
a  | b   | c   |  Q5     |  a
d  | e   | f   |  Q10    |  e
g  | h   | i   |  Q571   |  NA
j  | k   | l   |  Q23340 |  NA
m  | n   | o   |  Q41    |  o
h  | p   | s   |  Q10    |  p

The problem here is, there are more items than columns. So not all the values in column item exist as columns which causes keyError. I tried doing like below:

df['col_exist'] = [(col in df.columns) for col in df.item]
df['name'] = np.where(df['col_exist']==True, df[df.item], np.nan)

And I get error as:

KeyError: "['Q571', 'Q23340'] not in index"

I also tried using df.apply as below:

 df['name'] = np.where(df['col_exist']==True, df.apply(lambda x: x[x.item], axis=1), np.nan)

But I am getting error as below:

KeyError: ('Q571', 'occurred at index 2')

I am not sure why it is trying to access the column which does not exist despite of placing col_exit check there.

Can someone please help me to resolve this issue?


Solution

  • You can filter item column based on columns then use lookup i.e

    df['new'] = df['item'].apply(lambda x : x if x in df.columns else np.nan)
    

    or

    df['new'] = np.where(df['item'].isin(df.columns), df['item'], np.nan)
    df['name'] = np.nan
    df['name'] = df.lookup(df.index,df['new'].fillna('name'))
    

    Output:

        Q5    Q10    Q41    item  new   name
    0  a     b      c         Q5   Q5    a  
    1  d     e      f        Q10  Q10    e   
    2  g     h      i       Q571  NaN    NaN
    3  j     k      l     Q23340  NaN    NaN
    4  m     n      o        Q41  Q41    o   
    5  h     p      s        Q10  Q10    p   
    

    To remove new column df = df.drop('new',1)

    To make your approach work instead of df[df.item] use df['item']

    df['name'] = np.where(df['col_exist']==True, df['item'], np.nan)