Search code examples
pandaslogical-operators

Compare last digit


A have a csv file with columns as below:

Last_Price, Price, Marked
3,2.89,
1.99,2.09,
3.9,3.79,

I created a pandas dataframe named subdf. I want to compare last digit of Last_Price and Price columns, If they are not equal I assign 'X' to the column 'Marked' otherwise leave it as blank or NaN.

I have tried:

subdf['Marked'] = np.where([x.strip()[-1] for x in subdf['Last_Price']] == [x.strip()[-1] for x in subdf['Price']],
'X',
np.nan)

It says: AttributeError: 'float' object has no attribute 'strip'

I tried the below as well but it didn't work. It doesn't capture last digit. I guess I need a for loop as well.

str(subdf['Price']).strip()[-1]

Solution

  • Here's an example.

    It assumes that all prices have 2 digits or less after the decimal point. And that for "3" you want to compare for the "0" in "3.00" and not the last digit "3".

    Multiplying by 10 moves only the last digit to the end

    Then doing a MOD (%) division by 1 removes the front.

    Please note: I removed the spaces from the csv captions to allow proper importing of the names

    import pandas as pd
    import io   
    
    TESTDATA = """Last_Price,Price,Marked
    3,2.89,
    1.99,2.09,
    3.9,3.79,"""
    
    subdf = pd.read_csv(io.StringIO(TESTDATA), sep=",")
    
    subdf.loc[((subdf['Last_Price'] * 10) % 1) != 
              ((subdf['Price'] * 10) % 1), 'Marked'] = 'X'
    
    print(subdf)
    

    The result

       Last_Price  Price Marked  
    0        3.00   2.89      X         
    1        1.99   2.09    NaN         
    2        3.90   3.79      X         
    

    If you really want to compare against the string output, so the "9" from "3.9" vs "3.79" match, then import the dataframe as a str

    import pandas as pd
    import io   
    
    TESTDATA = """Last_Price,Price,Marked
    3,2.89,
    1.99,2.09,
    3.9,3.79,"""
    
    subdf = pd.read_csv(io.StringIO(TESTDATA), sep=",", dtype='str')
    subdf.loc[(subdf['Last_Price'].str.slice(-1)) != 
               (subdf['Price'].str.slice(-1)), 'Marked'] = 'X'
    print(subdf)
    

    New result

      Last_Price Price Marked 
    0          3  2.89      X        
    1       1.99  2.09    NaN        
    2        3.9  3.79    NaN