Search code examples
pythonpandasdataframevalueerror

pd.insert ValueError: Buffer has wrong number of dimensions (expected 1, got 2)


I have the following dataset(sample):

df = pd.DataFrame({'col_1':['Region1 (Y0001)','Region2 (Y0002)',
                       'Region3 (Y0003)','Region4 (Y0004)','Region5 (Y0005)'],
              'col_2':np.arange(1,6),
              'col_3':np.arange(6,11),
              'col_4':np.arange(11,16)})

NOTE: I had to change the real values, but the data types and structure are the same.

I can't get a hold of this error I get when using pd.insert().

df.insert(df.columns.get_loc('col_1'),
      'new_col',
      df['col_1'].str.extract(r'\((\w+)\)'))

I checked the correct functioning of pd.insert() by running the following, and it worked!

df.insert(0,'Random_Col',55)

As far as I can tell, this error came up after I upgraded pandas to 1.4.3; I didn't have this issue before. However, this doesn't explain why the above check was executed flawlessly.

How can I resolve this error?


Solution

  • DataFrame.insert expects 3 positional arguments. loc which is an int, column which is a valid column name, and value which is either a single value or 1 dimensional data (e.g. Series or array-like).

    Currently (pandas 1.4.3) str.extract returns a DataFrame by default:

    df['col_1'].str.extract(r'\((\w+)\)')
    
           0
    0  Y0001
    1  Y0002
    2  Y0003
    3  Y0004
    4  Y0005
    

    The error message:

    ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

    is indicating that a 2-dimensional structure (a DataFrame) was provided as the value to insert which is 1 dimension more than the expected.


    There are a few options to fix this.

    1. Since there is a single capture group we can stop the output from expanding into a DataFrame with expand=False
    df.insert(
        df.columns.get_loc('col_1'),
        'new_col',
        df['col_1'].str.extract(r'\((\w+)\)', expand=False)
    )
    

    OR

    1. Select a column from the output. In this case column 0.
    df.insert(
        df.columns.get_loc('col_1'),
        'new_col',
        df['col_1'].str.extract(r'\((\w+)\)')[0]  # Get capture group (column) 0
    )
    

    Either option produces df:

      new_col            col_1  col_2  col_3  col_4
    0   Y0001  Region1 (Y0001)      1      6     11
    1   Y0002  Region2 (Y0002)      2      7     12
    2   Y0003  Region3 (Y0003)      3      8     13
    3   Y0004  Region4 (Y0004)      4      9     14
    4   Y0005  Region5 (Y0005)      5     10     15