I have a sample dataframe
data = {"col1" : ["1 first 1", "2 second 2", "third 3", "4 fourth 4"]}
df = pd.DataFrame(data)
print(df)
col1
0 1 first 1
1 2 second 2
2 third 3
3 4 fourth 4
I want to extract the first digit
in the column and remove them
I tried to extract using
df["index"] = df["col1"].str.extract('(\d)')
col1 index
0 1 first 1 1
1 2 second 2 2
2 third 3 3
3 4 fourth 4 4
I want to remove the extracted digit from col1
if I use replace
both the start and end digits will be replaced.
Desired Output
col1 index
0 first 1 1
1 second 2 2
2 third 3 NaN
3 fourth 4 4
Use Series.str.replace
with Series.str.extract
with DataFrame.assign
for processing each column separately:
#added ^ for start of string
pat = '(^\d)'
df = df.assign(col1 = df["col1"].str.replace(pat, '', regex=True),
index= df["col1"].str.extract(pat))
print (df)
col1 index
0 first 1 1
1 second 2 2
2 third 3 NaN
3 fourth 4 4