I am currently working on a small project (Python 3.6.2) for automated timetable creation based on input I crawl from my college website. For creating the timetable and storing the appointments (and later reorganizing them, to find "better" schedules) I currently use Pandas DataFrame structure and am not fully satisfied. I want appointments to occupy multiple cells.
08:55:00
09:40:00
09:50:00 Mod, Spez, Sem
11:30:00
11:40:00
13:20:00
13:30:00 Systemnahe und parallele Programmierung - Ü 01
15:10:00
15:20:00
16:00:00
16:05:00
16:15:00 Modellierung, Spezifikation und Semantik - Ü 02
17:00:00
17:55:00
The first appointment for example takes place from 09:50 to 11:40 and should therefore occupy these slots. A quick solution would be to just put the name of this appointment in every timeslot it occupies but this wouldn't feel and look very clean. I want to export it to Excel when I'm finished. So my question is, how do I accomplish this or am I wrong with using DataFrames all along?
You need to first forward-fill your data, then group by then appoint, and finally summarize the time column:
from io import StringIO
import pandas
raw = StringIO("""\
08:55:00;
09:40:00;
09:50:00;Mod, Spez, Sem
11:30:00;
11:40:00;
13:20:00;
13:30:00;Systemnahe und parallele Programmierung - Ü 01
15:10:00;
15:20:00;
16:00:00;
16:05:00;
16:15:00;Modellierung, Spezifikation und Semantik - Ü 02
17:00:00;
17:55:00;
""")
df = (
pandas.read_table(raw, sep=';', header=None, names=['time', 'appt'], parse_dates=['time'])
.fillna(method='ffill')
.assign(offset=lambda df: df['appt'].shift(-1))
.query('appt == offset')
.groupby('appt')['time']
.describe()[['first', 'last']]
.rename(columns={'first': 'begin', 'last': 'end'})
.sort_values(by=['begin'])
.reset_index()
)
And that give me:
appt begin end
0 Mod, Spez, Sem 2017-11-21 09:50:00 2017-11-21 11:40:00
1 Systemnahe und parallele Programmierung - Ü 01 2017-11-21 13:30:00 2017-11-21 16:00:00
2 Modellierung, Spezifikation und Semantik - Ü 02 2017-11-21 16:15:00 2017-11-21 17:00:00