Search code examples
pandasconditional-statements

How to fix problem regarding selecting row by condition from pandas


I know how to select row by specific condition as below.

df[df['col'] == 'example']

But I am very confusing when I use this syntax after using read_excel() and melt().

Following is my example and I've trouble in figuring out why it happen.

Correct works :

import pandas as pd
test = pd.DataFrame({'term' : ['mid term',' final'],
                     '20230001' : ['A+','A0'], 
                     '20030002' : ['B-', 'B+'], 
                     '20030003' : ['A-','A-']})
test2 = pd.melt(test, id_vars=['term'], value_name = 'score')

test2.dtypes
# Out[138]: 
# term        object
# variable    object
# score       object
# dtype: object

test2[test2['variable'] == '20230001']
# Out[141]: 
#       term  variable score
# 0  mid term  20230001    A+
# 1     final  20230001    A0

Something wrong :


df = pd.read_excel(r'C:\Users\test.xlsx')   # test.xlsx is same data as test

df
#        term 20230001 20230002 20230003
# 0  mid term       A+       B-       A-
# 1     final       A0       B+       A-

df2 = pd.melt(df, id_vars=['term'], value_name = 'score')

df2.dtypes
# Out[145]: 
# term        object
# variable    object
# score       object
# dtype: object

df2[df2['variable'] == '20230001']
# Out[146]: 
# Empty DataFrame
# Columns: [term, variable, score]
# Index: []

########## Not working well ###########
########## But it works well when I deal 'variable' as numeric ##########

df2[df2['variable'] == 20230001]
# Out[147]: 
#        term  variable score
# 0  mid term  20230001    A+
# 1     final  20230001    A0

Solution

  • Your code must work. I suspect you have spaces in df2['variable'] values. Try this line. strip removes spaces from the start and end of strings:

    df2[df2['variable'].str.strip()== '20230001']