Eg: In the below column "Sample", I need to remove columns 2, 3 & 5 because they either have a leading zero in the beginning or a special character at the start.
Index | Sample |
---|---|
1 | 12345 |
2 | 00152 |
3 | 09 |
4 | 325 |
5 | .1246 |
I tried changing the "Sample" column datatype to string and extracting the first character like this:
t = df['Sample'].astype(str).str[0].astype(int)
But it gives me this output:
print(t)
| 1 | 1 |
| 2 | 1 |
| 3 | 9 |
| 4 | 3 |
| 5 | 0 |
It want it like this so that I can remove the respective rows using the index value:
| 1 | 1 |
| 2 | 0 |
| 3 | 0 |
| 4 | 3 |
| 5 | . |
Is my approach correct? Can anyone please help me regarding this? Thank a lot.
You may try using str.match
as follows:
df = df[df["sample"].str.match(r'[1-9]')]
This would only retain rows having sample values starting with a digit, other than zero.