In a DataFrame, I want to extract an integer (0-9) from a string which always comes after a specific word, and add it as a new column at a specific position (not the end). In the simplified example below I want to extract the integer which comes after the word 'number'.
DataFrame:
testDf = ['Number1', 'number2', 'aNumber8', 'Number6b']
df = pd.DataFrame(testDf, columns=['Tagname'])
Tagname
Number1
number2
aNumber8
Number6b
The code below works, but since it adds the column at the end of the dataframe, I have to move the column.
df['Number'] = df['Tagname'].str.extract(r'number*(\d)', re.IGNORECASE)
Tagname Number
Number1 1
number2 2
aNumber8 8
Number6b 6
insertNum = df['Number']
df.drop(labels=['Number'], axis=1, inplace = True)
df.insert(0, 'Number', insertNum)
Number Tagname
1 Number1
2 number2
8 aNumber8
6 Number6b
What I hoped I could do is to use .insert(), but this raises the ValueError shown below.
df.insert(0, 'Number', df['Tagname'].str.extract(r'number*(\d)', re.IGNORECASE))
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
Is it possible to use .insert() this way?
Use expand=False
for Series
from Series.str.extract
, if omit it get one or more column DataFrame
, because default parameter is expand=True
:
Details:
print (df['Tagname'].str.extract(r'number*(\d)', re.IGNORECASE))
0
0 1
1 2
2 8
3 6
print (df['Tagname'].str.extract(r'number*(\d)', re.IGNORECASE, expand=False))
0 1
1 2
2 8
3 6
Name: Tagname, dtype: object
df.insert(0,'Number',df['Tagname'].str.extract(r'number*(\d)', re.IGNORECASE, expand=False))
print (df)
Number Tagname
0 1 Number1
1 2 number2
2 8 aNumber8
3 6 Number6b