Split a row into more rows based on a string (regex)

I have this df and I want to split it:

cities3 = {'Metropolitan': ['New York', 'Los Angeles', 'San Francisco'],
           'NHL': ['RangersIslandersDevils', 'KingsDucks', 'Sharks']}
cities4 = pd.DataFrame(cities3)

cities4

to get a new df like this one: (please click on the images)

goal df

What code can I use?

Solution

You can split your column based on an upper-case letter preceded by a lower-case one using this regex:

(?<=[a-z])(?=[A-Z])

and then you can use the technique described in this answer to replace the column with its exploded version:

cities4 = cities4.assign(NHL=cities4['NHL'].str.split(r'(?<=[a-z])(?=[A-Z])')).explode('NHL')

Output:

    Metropolitan        NHL
0       New York    Rangers
0       New York  Islanders
0       New York     Devils
1    Los Angeles      Kings
1    Los Angeles      Ducks
2  San Francisco     Sharks

If you want to reset the index (to 0..5) you can do this (either after the above command or as a part of it)

cities4.reset_index().reindex(cities4.columns, axis=1)

Output:

    Metropolitan        NHL
0       New York    Rangers
1       New York  Islanders
2       New York     Devils
3    Los Angeles      Kings
4    Los Angeles      Ducks
5  San Francisco     Sharks