Search code examples
pythonpandasdataframenumpybinning

Using np.select to change mix data types (int and str) in a Pandas column


I've been trying to map a column from my df into 4 categories (binning) but, the column contains mixed values in it: int and str, it looks something like this:

df['data_column'] = ['22', '8', '11', 'Text', '17', 'Text', '6']

The categories I've been tring to change them to:

- 1 to 10: superb
- 10 to 20: awesome
- 20 to 30: great
- 'Text': text

This has been the way I've been trying to solve this:

my_criteria = [df['data_column'][df['data_column'] != 'Text'].astype('int64').between(1, 10),
               df['data_column'][df['data_column'] != 'Text'].astype('int64').between(10, 20),
               df['data_column'][df['data_column'] != 'Text'].astype('int64').between(20, 30),
               df['data_column'][df['data_column'] == 'Text']]

my_values = ['superb', 'awesome', 'great', 'text']

df['data_column'] = np.select(my_ criteria, my_ values, 0)

But, I get this error: ValueError: shape mismatch: objects cannot be broadcast to a single shape. How can I fix this? Any help is welcomed. The desired output:

df['data_column'] = ['great', 'superb', 'awesome', text', 'awesome', 'text', 'superb']

Thank you in advance!


Solution

  • All values in your condlist for np.select must be the same length. Yours are not.


    You can use pd.to_numeric with errors='coerce' to force values to convert to numeric.

    Then, use pd.cut to create your bins. Convert back to strings from categorical, and replace 'nan' entries with 'text'.

    Given:

      data_column
    0          22
    1           8
    2          11
    3        Text
    4          17
    5        Text
    6           6
    

    Doing:

    df.data_column = pd.to_numeric(df.data_column, 'coerce')
    
    df.data_column = (pd.cut(df.data_column, [1, 10, 20, 30], labels=['superb','awesome','great'])
                        .astype(str)
                        .replace('nan', 'text'))
    

    Output:

      data_column
    0       great
    1      superb
    2     awesome
    3        text
    4     awesome
    5        text
    6      superb