Search code examples
pythonpandasdataframedata-analysis

Data column values are not changing to float


I have a dataframe,

df,
    Name    Stage   Description
0   sri      1      sri is one of the good singer in this two
1   nan      2      thanks for reading
2   ram      1      ram is two of the good cricket player
3   ganesh   1      one driver
4   nan      2      good buddies

tried df["Stage"]=pd.to_numeric(df["Stage"],downcast="float")

but still the values are same


Solution

  • You can use df.Stage.astype(float) :

    In [6]: df.Stage.astype(float)
    Out[6]: 
    0    1.0
    1    2.0
    2    1.0
    3    1.0
    4    2.0
    Name: Stage, dtype: float64
    
    In [7]: df.Stage.astype(float)
    

    Using pd.to_numeric is better as it handles the conversion to a float type that takes less memory.

    Example

    In [23]: df.Stage 
    Out[23]: 
    0    1
    1    2
    2    1
    3    1
    4    2
    Name: Stage, dtype: int64
    
    In [24]: import sys 
    
    In [25]: sys.getsizeof(df.Stage)
    Out[25]: 272
    
    In [26]: sys.getsizeof(df.Stage.astype(float))
    Out[26]: 272
    
    In [27]: sys.getsizeof(pd.to_numeric(df.Stage, downcast='float'))
    Out[27]: 252
    

    In case there are bad data in df.Stage, coerce the value to NaN pd.to_numeric(df.Stage, errors='coerce', downcast='float')