Search code examples
pythonpandasdataframesplitextract

How to split/extract a new column and remove the extracted string from the column


I have a sample dataframe

data = {"col1" : ["1 first 1", "2 second 2", "third 3", "4 fourth 4"]}

df = pd.DataFrame(data)
print(df)


     col1
0   1 first 1
1   2 second 2
2     third 3
3   4 fourth 4

I want to extract the first digit in the column and remove them

I tried to extract using

df["index"] = df["col1"].str.extract('(\d)')
    col1       index
0   1 first 1   1
1   2 second 2  2
2   third 3     3
3   4 fourth 4  4

I want to remove the extracted digit from col1 if I use replace both the start and end digits will be replaced.

Desired Output

    col1    index
0   first 1     1
1   second 2    2
2   third 3     NaN
3   fourth 4    4

Solution

  • Use Series.str.replace with Series.str.extract with DataFrame.assign for processing each column separately:

    #added ^ for start of string
    pat = '(^\d)'
    df = df.assign(col1 = df["col1"].str.replace(pat, '', regex=True),
                   index= df["col1"].str.extract(pat))
    print (df)
            col1 index
    0    first 1     1
    1   second 2     2
    2    third 3   NaN
    3   fourth 4     4