Search code examples
pythonpandasdataframerowsdrop

Drop rows from Pandas DataFrame based on value after decimal


I have big DF with values and names. After interpolating values with increment 0.1 I want to remove the rows with numbers such as 1111.123, and keep only rows with 1111.0, 1111.1, 1111.2 (or 1111.100...)

Value Name
1111.1 Mark
1111.2 Mark
1111.234 Mark
1111.3 Mark
1111.346 Mark
1111.4 Mark

I've tried Series.str.split() by string but it is too slow for CSV bigger than 1 GB. I've tried regex with

m = df['Value'].str.filter(regex='(\d*)\.(\d{3})')

But it returns empty Series.

Another option I've tried is math.modf, but it returns

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Here is the code itself:

import pandas as pd
from math import modf

df = pd.read_csv("file.csv")

split = ((df['Value'].apply(lambda x: modf(x)[0])) * 10).apply(lambda x: modf(x)[0])
target_value = 0 < split < 1
df1 = df.loc[split == target_value]
*some code to remove rows*

The final result I expect is:

Value Name
1111.1 Mark
1111.2 Mark
1111.3 Mark
1111.4 Mark

Solution

  • If you do not want to convert your floats to string, use this simple math trick to check whether there are decimals after the first one:

    df[(df['Value']*10%1).eq(0)]
    

    Alternatively, you can simple check if the second to last character is a .:

    df[df['Value'].astype(str).str[-2].eq('.')]
    

    output:

        Value  Name
    0  1111.1  Mark
    1  1111.2  Mark
    3  1111.3  Mark
    5  1111.4  Mark