Search code examples
pythonpandasdataframedecimaltransformation

using python pandas to filter row value that within that dataframe column


i'm very new to python transformation world using pandas! I'm here to asking some code problem stuck by myself. For my case, i'm using python pandas to filter row value without eliminate them - as just select them out without delete them from the dataframe column within that dataframe column. However, within that dataframe column consist of mixed of float(value more than 2 decimal places) and integer (consist only one decimal place which the output as x.0) for this dataframe column, in python pandas dataframe, it does default assume that column's datatype to float64, so it's hard for me to determine/filter using isinstance() and alternative function/ way to deal with this type mixed data type column without eliminate the unwanted value

Below is a mock of my dataset which structure from a dataframe:

data = {
'time': [00.00.01,00.00.02,00.00.03,00.00.04,00.00.05,00.00.06,00.00.07,00.00.08,00.00.09,00.00.10,00.00.11,00.00.12,
00.00.13,00.00.14,00.00.15,00.00.16,00.00.17,00.00.18,00.00.19,00.00.20,00.00.21,00.00.22,00.00.23,00.00.24],
'value': [1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 1, 2, 3, 4, 5, 6, 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 1, 2, 3, 4, 5, 6]
data = pd.DataFrame(data)

while print out as a dataframe, the data looks like:

> time value
1 00.00.1 1.234
2 00.00.2 2.345
3 00.00.3 3.456
4 00.00.4 4.567
5 00.00.5 5.678
6 00.00.6 6.789
7 00.00.7 1.0
8 00.00.8 2.0
9 00.00.9 3.0
10 00.00.10 4.0
11 00.00.11 5.0
12 00.00.12 6.0
13 00.00.13 1.234
14 00.00.14 2.345
15 00.00.15 3.456
16 00.00.16 4.567
17 00.00.17 5.678
18 00.00.18 6.789
19 00.00.19 1.0
20 00.00.10 2.0
21 00.00.21 3.0
22 00.00.22 4.0
23 00.00.23 5.0
24 00.00.24 6.0

I've tried for the below solution to fix this problem with. -My solution- this will remove the values that with only one decimal place while keep the value more than one decimal places.

value = data[(data['value'].astype(str).str.split('.').str[1].str.len()>1)]

and the printed output might be like:

> time value
0 00.00.1 1.234
1 00.00.2 2.345
2 00.00.3 3.456
3 00.00.4 4.567
4 00.00.5 5.678
5 00.00.6 6.789
13 00.00.13 1.234
14 00.00.14 2.345
15 00.00.15 3.456
16 00.00.16 4.567
17 00.00.17 5.678
18 00.00.18 6.789

I'm expecting a dataframe could get a result like this without eliminate any unwanted row values even though there is undergoing a filter to select only the value with only one decimal place:

> time value
1 00.00.1 1.234
2 00.00.2 2.345
3 00.00.3 3.456
4 00.00.4 4.567
5 00.00.5 5.678
6 00.00.6 6.789
7 00.00.7 1.0
8 00.00.8 2.0
9 00.00.9 3.0
10 00.00.10 4.0
11 00.00.11 5.0
12 00.00.12 6.0
13 00.00.13 1.234
14 00.00.14 2.345
15 00.00.15 3.456
16 00.00.16 4.567
17 00.00.17 5.678
18 00.00.18 6.789
19 00.00.19 1.0
20 00.00.10 2.0
21 00.00.21 3.0
22 00.00.22 4.0
23 00.00.23 5.0
24 00.00.24 6.0

I really do need your help to fix this problem, thanks in advance!


Solution

  • What you want is really unclear:

    filter row value without eliminate them - as just select them out without delete them from the dataframe column.

    In this case, just use a boolean mask without selecting data from your dataframe:

    m = data['value'].ne(data['value'].round())
    

    Output:

    # boolean mask
    >>> m
    1      True
    2      True
    3      True
    4      True
    5      True
    6      True
    7     False
    8     False
    9     False
    10    False
    11    False
    12    False
    13     True
    14     True
    15     True
    16     True
    17     True
    18     True
    19    False
    20    False
    21    False
    22    False
    23    False
    24    False
    Name: value, dtype: bool
    
    # selection
    >>> data[m]
            time  value
    1    00.00.1  1.234
    2    00.00.2  2.345
    3    00.00.3  3.456
    4    00.00.4  4.567
    5    00.00.5  5.678
    6    00.00.6  6.789
    13  00.00.13  1.234
    14  00.00.14  2.345
    15  00.00.15  3.456
    16  00.00.16  4.567
    17  00.00.17  5.678
    18  00.00.18  6.789