Search code examples
pythonpandasdataframeinsertvalueerror

Python: pandas.DataFrame.insert ValueError: Buffer has wrong number of dimensions


In a DataFrame, I want to extract an integer (0-9) from a string which always comes after a specific word, and add it as a new column at a specific position (not the end). In the simplified example below I want to extract the integer which comes after the word 'number'.

DataFrame:

testDf = ['Number1', 'number2', 'aNumber8', 'Number6b']
df = pd.DataFrame(testDf, columns=['Tagname'])

Tagname
Number1
number2
aNumber8
Number6b

The code below works, but since it adds the column at the end of the dataframe, I have to move the column.

df['Number'] = df['Tagname'].str.extract(r'number*(\d)', re.IGNORECASE)

Tagname    Number
Number1     1
number2     2
aNumber8    8
Number6b    6

insertNum = df['Number']
df.drop(labels=['Number'], axis=1, inplace = True)
df.insert(0, 'Number', insertNum)

Number    Tagname
1         Number1
2         number2
8         aNumber8
6         Number6b

What I hoped I could do is to use .insert(), but this raises the ValueError shown below.

df.insert(0, 'Number', df['Tagname'].str.extract(r'number*(\d)', re.IGNORECASE))

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Is it possible to use .insert() this way?


Solution

  • Use expand=False for Series from Series.str.extract, if omit it get one or more column DataFrame, because default parameter is expand=True:

    Details:

    print (df['Tagname'].str.extract(r'number*(\d)', re.IGNORECASE))
       0
    0  1
    1  2
    2  8
    3  6
    
    print (df['Tagname'].str.extract(r'number*(\d)', re.IGNORECASE, expand=False))
    0    1
    1    2
    2    8
    3    6
    Name: Tagname, dtype: object
    

    df.insert(0,'Number',df['Tagname'].str.extract(r'number*(\d)', re.IGNORECASE, expand=False))
    print (df)
      Number   Tagname
    0      1   Number1
    1      2   number2
    2      8  aNumber8
    3      6  Number6b