Search code examples
pandasstartswith

pandas create new column from existing column values


I'm looking to create a new column based off the values within an existing column. If the existing column starts with 'abc' or 'def' then set then new column to 'x'. Otherwise set it to 'y'.

The check should be case insensitive.

I have something that looks like this –

import pandas as pd

df = pd.DataFrame({'command': ['abc123', 'abcdef', 'hold',
                               'release', 'hold', 'abcxyz',
                               'kill', 'def123', 'hold'],
                   'name': ['fred', 'wilma', 'barney',
                            'fred', 'barney', 'betty',
                            'pebbles', 'dino', 'wilma'],
                   'date': ['2020-05', '2020-05', '2020-05',
                            '2020-06', '2020-06', '2020-06',
                            '2020-07', '2020-07', '2020-07']})

With a print -

   command     date     name
0   abc123  2020-05     fred
1   abcdef  2020-05    wilma
2     hold  2020-05   barney
3  release  2020-06     fred
4     hold  2020-06   barney
5   abcxyz  2020-06    betty
6     kill  2020-07  pebbles
7   def123  2020-07     dino
8     hold  2020-07    wilma

I'm looking to have something like this -

  command     date     name   status
0  abc123  2020-05     fred        x
1  abcdef  2020-05    wilma        x
2    hold  2020-05   barney        y
3     CHG  2020-06     fred        y
4    hold  2020-06   barney        y
5  abcxyz  2020-06    betty        x
6    kill  2020-07  pebbles        y
7  def123  2020-07     dino        x
8    hold  2020-07    wilma        y

Using the following I can get something to work if the value is equal to -

def source(row):
    if row['command'] == 'abcdef':
        return 'x'
    else:
        return 'y'


# Apply the results from the above Function
df['source'] = df.apply(source, axis=1)

However the command values could be anything and I can't hard code a search for every possibility.

I can't figure out how to get it to work using startswith.


Solution

  • Use Series.str.startswith and np.where for conditional column:

    m = df['command'].str.startswith(('abc', 'def'))
    df['status'] = np.where(m, 'x', 'y')
    

    Or just string slicing the first 3 characters and using Series.isin:

    m = df['command'].str[:3]
    m = m.isin(['abc', 'def'])
    
    df['status'] = np.where(m, 'x', 'y')
    
       command     name     date status
    0   abc123     fred  2020-05      x
    1   abcdef    wilma  2020-05      x
    2     hold   barney  2020-05      y
    3  release     fred  2020-06      y
    4     hold   barney  2020-06      y
    5   abcxyz    betty  2020-06      x
    6     kill  pebbles  2020-07      y
    7   def123     dino  2020-07      x
    8     hold    wilma  2020-07      y