I am new to Python and StackOverflow.
I am trying to convert some values in a column use_ab
of my dataframe:
Here is what my column looks like:
df['use_ab'].value_counts()
False 534167
FALSE 15222
True 12724
TRUE 1023
What I want to do is convert all values in Upper case.
I tried this code:
df['use_ab'] = df['use_ab'].str.upper()
It converts "True" and "False" into Uppercase and the rest in NaN values and gives me this output:
FALSE 15222
TRUE 1023
Please help me to convert this column to Uppercase.
You have a mixed column of both string and boolean values (and maybe some other things too), and its dtype
is almost surely 'object' - you should check, and please confirm.
Solution: You can (and should) specify the dtype of a problematic column when you read it in, also specify ALL the true and false values, at read-time:
pd.read_csv(..., dtype={'use_ab': bool}),
true_values=['TRUE','True',True], false_values=['FALSE','False',False])
Note in particular that string 'False'
and bool False
are not the same thing! and trying to use .str does not convert the bools
Re: df.dtypes
. The dtype of your column doesn't seem to be string, but it doesn't seem to be to boolean either, since the string accessor .str.upper()
is throwing away most of your 'False' values, as value_counts()
proves.
Also, since your series obviously has NaNs and you need to count they're not being mishandled, use .value_counts(..., dropna=False)
to include them.
import pandas as pd
import numpy as np
df = pd.Series(['True',np.nan,'FALSE','TRUE',np.nan,'False',False,True,True])
# Now note that the dtype is automatically assigned to pandas 'object'!
>>> df.dtype
dtype('O')
>>> df.value_counts(dropna=False)
True 2
NaN 2
FALSE 1
TRUE 1
True 1
False 1
False 1
dtype: int64
See how mistakenly trying to use .str.upper()
accessor on this mixed column is trashing those values that are actually bools, while case-transforming the strings:
>>> df.str.upper()
0 TRUE
1 NaN
2 FALSE
3 TRUE
4 NaN
5 FALSE
6 NaN <-- bool True coerced to NaN!
7 NaN <-- bool False coerced to NaN!
8 NaN <-- bool False coerced to NaN!
dtype: object