Search code examples
pythonpandasdataframedata-analysis

Expand a string from a column into different separate columns in Pandas


I have a df in the following form:

id   sid      steps
A     1       step1
A     1    step1-step2
A     1  step1-step2-step3

This contains data of how a user A went through a certain series of pages (steps) in a given session (sid). I want to take these dash delimitated steps and create individual columns for every step.

Result:

id     sid      steps       page_step1 page_step2  page_step3
  A     1       step1         step1        NA           NA
  A     1    step1-step2      step1      step2          NA
  A     1  step1-step2-step3  step1      step2        step3

I don't know exactly how many steps so I was hoping that they're created dynamically. Been stuck on this all week, thanks!


Solution

  • Use str.split:

    >>> df.join(df["steps"].str.split("-",expand=True).add_prefix("page_step"))
    
     id  sid              steps page_step0 page_step1 page_step2
    0  A    1              step1      step1       None       None
    1  A    1        step1-step2      step1      step2       None
    2  A    1  step1-step2-step3      step1      step2      step3
    

    If the numbering must start from 1 instead of 0:

    steps = df["steps"].str.split("-",expand=True)
    output = df.join(steps.rename(columns={i: f"page_setup{i+1}" for i in steps.columns}))
    
    >>> output
      id  sid              steps page_setup1 page_setup2 page_setup3
    0  A    1              step1       step1        None        None
    1  A    1        step1-step2       step1       step2        None
    2  A    1  step1-step2-step3       step1       step2       step3