Search code examples
pandasdataframeone-to-manymulti-index

How to manage row spans and column spans with two level indexing


I have the following dataframe, mapping a one-to-many relationship between "courses" and "lessons":

   course_id       course_name  lesson_id     lesson_title
0          0          Learn C#          1              foo
1          0          Learn C#          2              bar
2          0          Learn C#          3              baz
3          1  Origami together          1        the crane
4          1  Origami together          2  crease patterns
5          2        WIP course          1        the first

How do I format it so that:

  • each lesson row is within the span of its belonging course row

  • lesson_id and lesson_title columns are under the span of a common lessons column

as shown below:

                                            lessons
   course_id       course_name         id            title
0          0          Learn C#          1              foo
1                                       2              bar
2                                       3              baz
3          1  Origami together          1        the crane
4                                       2  crease patterns
5          2        WIP course          1        the first

and producing an output similar to this when exported to Excel:

expected output table

By looking at similar questions I found that accepted answers involve the use of multi-index, but in this case the first level of the index would have to comprehend all course related columns.

On top of that, the starting table is actually dinamically generated from corresponding Course and Lesson dataclasses, so I fear this approach wouldn't scale well if I were to add attributes to the Course class.

Ideally I would index by course_id and lesson_id, then specify which columns are indexed by the former or the latter, thus avoiding course attributes being duplicated for each lesson;

Is there a way to achieve that?


Solution

  • If need MultiIndex in index and columns is possible use:

    out = df.set_index(['course_id','course_name'])
    out.columns = out.columns.str.split('_', expand=True)
    

    If need row spans for both levels here is trick - helper column with empty strings:

    out = df.assign(**{'':''}).set_index(['course_id','course_name', ''])
    out.columns = out.columns.str.split('_', expand=True)
    
    print (out)
                                lesson                 
                                    id            title
    course_id course_name                              
    0         Learn C                1              foo
                                     2              bar
                                     3              baz
    1         Origami together       1        the crane
                                     2  crease patterns
    2         WIP course             1        the first
    

    If need remove third column in Excel:

    file = 'out.xlsx'
    out.to_excel(file)
    
    import xlwings as xw
    wb = xw.Book(file)
    wb.sheets['Sheet1'].range('C:C').delete()
    wb.save(file)