I have big DF with values and names. After interpolating values with increment 0.1 I want to remove the rows with numbers such as 1111.123, and keep only rows with 1111.0, 1111.1, 1111.2 (or 1111.100...)
Value | Name |
---|---|
1111.1 | Mark |
1111.2 | Mark |
1111.234 | Mark |
1111.3 | Mark |
1111.346 | Mark |
1111.4 | Mark |
I've tried Series.str.split()
by string but it is too slow for CSV bigger than 1 GB. I've tried regex with
m = df['Value'].str.filter(regex='(\d*)\.(\d{3})')
But it returns empty Series.
Another option I've tried is math.modf
, but it returns
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Here is the code itself:
import pandas as pd
from math import modf
df = pd.read_csv("file.csv")
split = ((df['Value'].apply(lambda x: modf(x)[0])) * 10).apply(lambda x: modf(x)[0])
target_value = 0 < split < 1
df1 = df.loc[split == target_value]
*some code to remove rows*
The final result I expect is:
Value | Name |
---|---|
1111.1 | Mark |
1111.2 | Mark |
1111.3 | Mark |
1111.4 | Mark |
If you do not want to convert your floats to string, use this simple math trick to check whether there are decimals after the first one:
df[(df['Value']*10%1).eq(0)]
Alternatively, you can simple check if the second to last character is a .
:
df[df['Value'].astype(str).str[-2].eq('.')]
output:
Value Name
0 1111.1 Mark
1 1111.2 Mark
3 1111.3 Mark
5 1111.4 Mark