I have a dataframe where in one of the columns I only want to keep a subset of the string. In the example below I only want to keep the peoples names.
**Example: **
column 1
1.Joe Smith, NYC(212)
2.Jane Doe, HOU(713)
To remove everything left of the name I have used df['column1'] = df['column1'].str.lstrip("0123456789.")
This worked successfully. But isloltating the name from the comma onward is what I can't figure out. Not sure if RegEx would be better suited here?
Thanks!
Try with regex to extract names,
df['column1'].str.extract(r'\d+\.(.+?),')
Output:
0 Joe Smith
1 Jane Doe
More details on pattern,
\d+
: Match one or more digits.\.
: Match a period (dot) character.(.+?)
: Capture one or more characters (non-greedy) into a group.,
: Match a comma character.