I know how to select row by specific condition as below.
df[df['col'] == 'example']
But I am very confusing when I use this syntax after using read_excel()
and melt()
.
Following is my example and I've trouble in figuring out why it happen.
Correct works :
import pandas as pd
test = pd.DataFrame({'term' : ['mid term',' final'],
'20230001' : ['A+','A0'],
'20030002' : ['B-', 'B+'],
'20030003' : ['A-','A-']})
test2 = pd.melt(test, id_vars=['term'], value_name = 'score')
test2.dtypes
# Out[138]:
# term object
# variable object
# score object
# dtype: object
test2[test2['variable'] == '20230001']
# Out[141]:
# term variable score
# 0 mid term 20230001 A+
# 1 final 20230001 A0
Something wrong :
df = pd.read_excel(r'C:\Users\test.xlsx') # test.xlsx is same data as test
df
# term 20230001 20230002 20230003
# 0 mid term A+ B- A-
# 1 final A0 B+ A-
df2 = pd.melt(df, id_vars=['term'], value_name = 'score')
df2.dtypes
# Out[145]:
# term object
# variable object
# score object
# dtype: object
df2[df2['variable'] == '20230001']
# Out[146]:
# Empty DataFrame
# Columns: [term, variable, score]
# Index: []
########## Not working well ###########
########## But it works well when I deal 'variable' as numeric ##########
df2[df2['variable'] == 20230001]
# Out[147]:
# term variable score
# 0 mid term 20230001 A+
# 1 final 20230001 A0
Your code must work. I suspect you have spaces in df2['variable']
values. Try this line. strip
removes spaces from the start and end of strings:
df2[df2['variable'].str.strip()== '20230001']