Search code examples
pythonpandasdataframesplitdelimiter

Remove everything before a delimiter when not all cells contain that delimiter


I have a dataframe with a 'timezone' column. Some of the entries are listed as 'country/city'. I want them to just be 'city'. There were similar questions on stackoverflow, from which I came up with the following.

df['timezone'] = df['timezone'].str.split('/').str[1]

However, this deleted the entries without a '/' in. So I tried various other adaptations but couldn't get any to work.

Next I tried to construct a lambda function and use map, doing various adaptions of below, this didn't work either.

df['timezone'] = df['timezone'].map(lambda x: x.split('/').str[1]) 

#AttributeError: 'list' object has no attribute 'str'

Finally, I decided to write a loop, below. Python took a while working through it, I was hopeful, but in the end nothing seemed to happen.

x = df['timezone']

for entry in x.items() :
    if x.str.contains('/') is True:
        x.str.split('/').str[1] 
        update(x) 
    else:
        pass

Any help or advice much appreciated, thanks.


Solution

  • Restrict the number of splits to 1 (required when the delimiter could occur more than once), and then use str[-1] instead of str[1]:

    df   
           timezone
    0  country/city
    1           foo
    2           bar
    
    df['timezone'] = df['timezone'].str.split('/', n=1).str[-1]
    df
    
      timezone
    0     city
    1      foo
    2      bar
    

    str[-1] adequately handles those cases where there was nothing to split on.