Search code examples
pandasstrip

Extracting portions of the entries of Pandas dataframe


I have a Pandas dataframe with several columns wherein the entries of each column are a combination of​ numbers, upper and lower case letters and some special characters:, i.e, "=A-Za-z0-9_|"​. Each entry of the column is of the form:

'x=ABCDefgh_5|123|' ​

I want to retain only the numbers 0-9 appearing only between | | and strip out all other characters​. Here is my code for one column of the dataframe:

list(map(lambda x: x.lstrip(r'\[=A-Za-z_|,]+'), df[1]))

However, the code returns the full entry ​'x=ABCDefgh_5|123|' ​ without stripping out anything. Is there an error in my code?


Solution

  • Instead of working with these unreadable regex expressions, you might want to consider a simple split. For example:

    import pandas as pd
    
    d = {'col': ["x=ABCDefgh_5|123|", "x=ABCDefgh_5|123|"]}
    df = pd.DataFrame(data=d)
    
    output = df["col"].str.split("|").str[1]