Search code examples
pythondataframeobjectsplitcalculated-columns

Need help splitting a column in my DataFrame (Python)


I have a Python DataFrame "dt", one of the dt columns "betName" is filled with objects that sometimes have +/- numbers after the names. I'm trying to figure out how to separate "betName" into 2 columns "betName" & "line" where "betName" is just the name and "line" has the +/- number or regular number

Please see screenshots, thank you for helping!

example of problem and desired result

dt["betName"]


Solution

  • Try this (updated) code:

    df2=df['betName'].str.split(r' (?=[+-]\d{1,}\.?\d{,}?)', expand=True).astype('str')
    

    Explanation. You can use str.split to split a text in the rows into 2 or more columns by regular expression:

      (?=[+-]\d{1,}\.?\d{,}?)
    

    ' ' - Space char is the first.

    () - Indicates the start and end of a group.

    ?= - Lookahead assertion. Matches if ... matches next, but doesn’t consume any of the string.

    [+-] - a set of characters. It will match + or -.

    \d{1,} - \d is a digit from 0 to 9 with {start, end} number of digits. Here it means from 1 to any number: 1,200,4000 etc.

    \.? - \. for a dot and ? - 0 or 1 repetitions of the preceding expression group or symbol.

    str.split(pattern=None, n=- 1, expand=False)

    pattern - string or regular expression to split on. If not specified, split on whitespace

    n - number of splits in output. None, 0 and -1 will be interpreted as return all splits.

    expand - expand the split strings into separate columns.

    • True for placing splitted groups into different columns
    • False for Series/Index lists of strings in a row.

    by .astype('str') function you convert dataframe to string type.

    The output.

    The output.