i'm very new to python transformation world using pandas! I'm here to asking some code problem stuck by myself.
For my case, i'm using python pandas to filter row value without eliminate them - as just select them out without delete them from the dataframe column within that dataframe column. However, within that dataframe column consist of mixed of float(value more than 2 decimal places) and integer (consist only one decimal place which the output as x.0
)
for this dataframe column, in python pandas dataframe, it does default assume that column's datatype to float64
, so it's hard for me to determine/filter using isinstance()
and alternative function/ way to deal with this type mixed data type column without eliminate the unwanted value
Below is a mock of my dataset which structure from a dataframe:
data = {
'time': [00.00.01,00.00.02,00.00.03,00.00.04,00.00.05,00.00.06,00.00.07,00.00.08,00.00.09,00.00.10,00.00.11,00.00.12,
00.00.13,00.00.14,00.00.15,00.00.16,00.00.17,00.00.18,00.00.19,00.00.20,00.00.21,00.00.22,00.00.23,00.00.24],
'value': [1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 1, 2, 3, 4, 5, 6, 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 1, 2, 3, 4, 5, 6]
data = pd.DataFrame(data)
while print out as a dataframe, the data looks like:
> time value
1 00.00.1 1.234
2 00.00.2 2.345
3 00.00.3 3.456
4 00.00.4 4.567
5 00.00.5 5.678
6 00.00.6 6.789
7 00.00.7 1.0
8 00.00.8 2.0
9 00.00.9 3.0
10 00.00.10 4.0
11 00.00.11 5.0
12 00.00.12 6.0
13 00.00.13 1.234
14 00.00.14 2.345
15 00.00.15 3.456
16 00.00.16 4.567
17 00.00.17 5.678
18 00.00.18 6.789
19 00.00.19 1.0
20 00.00.10 2.0
21 00.00.21 3.0
22 00.00.22 4.0
23 00.00.23 5.0
24 00.00.24 6.0
I've tried for the below solution to fix this problem with. -My solution- this will remove the values that with only one decimal place while keep the value more than one decimal places.
value = data[(data['value'].astype(str).str.split('.').str[1].str.len()>1)]
and the printed output might be like:
> time value
0 00.00.1 1.234
1 00.00.2 2.345
2 00.00.3 3.456
3 00.00.4 4.567
4 00.00.5 5.678
5 00.00.6 6.789
13 00.00.13 1.234
14 00.00.14 2.345
15 00.00.15 3.456
16 00.00.16 4.567
17 00.00.17 5.678
18 00.00.18 6.789
I'm expecting a dataframe could get a result like this without eliminate any unwanted row values even though there is undergoing a filter to select only the value with only one decimal place:
> time value
1 00.00.1 1.234
2 00.00.2 2.345
3 00.00.3 3.456
4 00.00.4 4.567
5 00.00.5 5.678
6 00.00.6 6.789
7 00.00.7 1.0
8 00.00.8 2.0
9 00.00.9 3.0
10 00.00.10 4.0
11 00.00.11 5.0
12 00.00.12 6.0
13 00.00.13 1.234
14 00.00.14 2.345
15 00.00.15 3.456
16 00.00.16 4.567
17 00.00.17 5.678
18 00.00.18 6.789
19 00.00.19 1.0
20 00.00.10 2.0
21 00.00.21 3.0
22 00.00.22 4.0
23 00.00.23 5.0
24 00.00.24 6.0
I really do need your help to fix this problem, thanks in advance!
What you want is really unclear:
filter row value without eliminate them - as just select them out without delete them from the dataframe column.
In this case, just use a boolean mask without selecting data from your dataframe:
m = data['value'].ne(data['value'].round())
Output:
# boolean mask
>>> m
1 True
2 True
3 True
4 True
5 True
6 True
7 False
8 False
9 False
10 False
11 False
12 False
13 True
14 True
15 True
16 True
17 True
18 True
19 False
20 False
21 False
22 False
23 False
24 False
Name: value, dtype: bool
# selection
>>> data[m]
time value
1 00.00.1 1.234
2 00.00.2 2.345
3 00.00.3 3.456
4 00.00.4 4.567
5 00.00.5 5.678
6 00.00.6 6.789
13 00.00.13 1.234
14 00.00.14 2.345
15 00.00.15 3.456
16 00.00.16 4.567
17 00.00.17 5.678
18 00.00.18 6.789