Search code examples
pythonpandasscheduled-taskstimetable

Python Pandas Cells with different size


I am currently working on a small project (Python 3.6.2) for automated timetable creation based on input I crawl from my college website. For creating the timetable and storing the appointments (and later reorganizing them, to find "better" schedules) I currently use Pandas DataFrame structure and am not fully satisfied. I want appointments to occupy multiple cells.

08:55:00    
09:40:00    
09:50:00    Mod, Spez, Sem
11:30:00    
11:40:00    
13:20:00    
13:30:00    Systemnahe und parallele Programmierung - Ü 01
15:10:00    
15:20:00    
16:00:00    
16:05:00    
16:15:00    Modellierung, Spezifikation und Semantik - Ü 02
17:00:00    
17:55:00    

The first appointment for example takes place from 09:50 to 11:40 and should therefore occupy these slots. A quick solution would be to just put the name of this appointment in every timeslot it occupies but this wouldn't feel and look very clean. I want to export it to Excel when I'm finished. So my question is, how do I accomplish this or am I wrong with using DataFrames all along?


Solution

  • You need to first forward-fill your data, then group by then appoint, and finally summarize the time column:

    from io import StringIO
    import pandas
    
    raw = StringIO("""\
    08:55:00;
    09:40:00;
    09:50:00;Mod, Spez, Sem
    11:30:00;
    11:40:00;
    13:20:00;
    13:30:00;Systemnahe und parallele Programmierung - Ü 01
    15:10:00;
    15:20:00;
    16:00:00;
    16:05:00;
    16:15:00;Modellierung, Spezifikation und Semantik - Ü 02
    17:00:00;
    17:55:00;
    """)
    
    df = (
        pandas.read_table(raw, sep=';', header=None, names=['time', 'appt'], parse_dates=['time'])
            .fillna(method='ffill')
            .assign(offset=lambda df: df['appt'].shift(-1))
            .query('appt == offset')
            .groupby('appt')['time']
            .describe()[['first', 'last']]
            .rename(columns={'first': 'begin', 'last': 'end'})
            .sort_values(by=['begin'])
            .reset_index()
    )
    

    And that give me:

                                                  appt                begin                  end
    0                                   Mod, Spez, Sem  2017-11-21 09:50:00  2017-11-21 11:40:00
    1   Systemnahe und parallele Programmierung - Ü 01  2017-11-21 13:30:00  2017-11-21 16:00:00
    2  Modellierung, Spezifikation und Semantik - Ü 02  2017-11-21 16:15:00  2017-11-21 17:00:00