Search code examples
pythonpandasdataframeintegerstring-formatting

Remove leading zeros or foreign characters in sequence number


Eg: In the below column "Sample", I need to remove columns 2, 3 & 5 because they either have a leading zero in the beginning or a special character at the start.

Index Sample
1 12345
2 00152
3 09
4 325
5 .1246

I tried changing the "Sample" column datatype to string and extracting the first character like this:

t = df['Sample'].astype(str).str[0].astype(int)

But it gives me this output:

print(t)

 
|   1   |  1  | 
|   2   |  1  | 
|   3   |  9  | 
|   4   |  3  |
|   5   |  0  | 

It want it like this so that I can remove the respective rows using the index value:

 
|   1   |  1  | 
|   2   |  0  | 
|   3   |  0  | 
|   4   |  3  |
|   5   |  .  | 

Is my approach correct? Can anyone please help me regarding this? Thank a lot.


Solution

  • You may try using str.match as follows:

    df = df[df["sample"].str.match(r'[1-9]')]
    

    This would only retain rows having sample values starting with a digit, other than zero.