Search code examples
database-designdata-structuresscheduling

Scheduling Employees - what data structure to use?


Question

I'm trying to write a simple employee Scheduling software for about 10-20 people in my software development company. After some consideration I settled on writing a web app in Python, Ruby or PHP + Postgres/MySQL DB. While designing database models I began to wonder what data structure would actually be the best for that kind of application.

What it will look like

Example of app showing the month view would be similar to this:

 OCTOBER    1 2 3 4 5 6 7 8 9 ...
John Apple  M M A A N N O O O ...
Daisy Pear  O O O M M A A N N ...
Steve Cat   A A N N O O O M M ...
Maria Dog   N N O O O M M A A ...

where M -> for Morning shift; A -> Afternoon shift etc. (letters can be changed to codes)

What data structure or database design would be the best for this? I was thinking about storing strings (max of 31 characters -> 1 char , 1 day) similar to -> "MMAANNOOOAAMMNNAAOO..." for each user; Month table would contain such strings for each employee.

What would you suggest?


Solution

  • I would go with three-table Kimball star (Date, Employee, Schedule), because sooner or later you will be asked to create (demanding) reports out of this. Who worked most nights? Who worked most weekends? Who never works weekends? Why am I always scheduled Friday afternoon? On which day of a week are certain employees most likely not to show up? Etc, etc...

    Tables would be:

    TABLE dimDate (
        KeyDate
      , FullDate
      , DayOfWeek
      , DayNumberInWeek
      , IsHoliday
      ,... more here
    )
    

    You can pre-fill dimDate table for 10 years, or so -- may need to tweak the "IsHoliday" column from time to time.

    Employee table also changes (relatively) rarely.

    TABLE dimEmployee (
        KeyEmployee
      , FirstName
      , LastName
      , Age
      , ... more here
    )
    

    Schedule table is where you would fill-in the work schedule, I have also suggested "HoursOfWork" for each shift, this way it is easy to aggregate hours in reports, like: "How many hours did John Doe work last year on holidays?"

    TABLE
    factSchedule (
        KeySchedule  -- surrogate PK
      , KeyDate      -- FK to dimDate table
      , KeyEmployee  -- FK to dimEmployee table
      , Shift        -- shift number (degenerate dimension)
      , HoursOfWork  -- number of work hours in that shift
    )
    

    Instead of having the surrogate KeySchedule, you could also combine KeyDate, KeyEmployee and Shift into a composite primary key to make sure you can not schedule same person on the same shift the same day. Check this on the application layer if the surrogate key is used. When querying, join tables like:

    SELECT SUM(s.HoursOfWork)
     FROM factSchedule AS s
     JOIN dimDate      AS d ON s.KeyDate = d.KeyDate
     JOIN dimEmployee  AS e ON s.KeyEmployee = e.KeyEmployee
    WHERE e.FirstName='John'
      AND e.LastName='Doe'
      AND d.Year = 2009
      AND d.IsHoliday ='Yes';
    

    If using MySQL it is OK to use MyISAM for storage engine and implement your foreign keys (FK) as "logical only" -- use the application layer to take care of referential integrity.

    Hope this helps.


    empschd_model_01