I have a Python DataFrame "dt", one of the dt columns "betName" is filled with objects that sometimes have +/- numbers after the names. I'm trying to figure out how to separate "betName" into 2 columns "betName" & "line" where "betName" is just the name and "line" has the +/- number or regular number
Please see screenshots, thank you for helping!
Try this (updated) code:
df2=df['betName'].str.split(r' (?=[+-]\d{1,}\.?\d{,}?)', expand=True).astype('str')
Explanation. You can use str.split
to split a text in the rows into 2 or more columns by regular expression:
(?=[+-]\d{1,}\.?\d{,}?)
' '
- Space char is the first.
()
- Indicates the start and end of a group.
?=
- Lookahead assertion. Matches if ... matches next, but doesn’t consume any of the string.
[+-]
- a set of characters. It will match + or -.
\d{1,}
- \d
is a digit from 0 to 9 with {start, end}
number of digits. Here it means from 1 to any number: 1,200,4000 etc.
\.?
- \.
for a dot and ?
- 0 or 1 repetitions of the preceding expression group or symbol.
str.split(pattern=None, n=- 1, expand=False)
pattern
- string or regular expression to split on. If not specified, split on whitespace
n
- number of splits in output. None, 0 and -1 will be interpreted as return all splits.
expand
- expand the split strings into separate columns.
True
for placing splitted groups into different columnsFalse
for Series/Index lists of strings in a row.by .astype('str')
function you convert dataframe to string type.
The output.