Search code examples
pythondesign-patternsdatetimearchitecturesystem-design

Design - How to handle timestamps (storage) and when performing computations ; Python


I'm trying to determine (as my application is dealing with lots of data from different sources and different time zones, formats, etc) how best to store my data AND work with it.

For example, should I store everything as UTC? This means when I fetch data I need to determine what timezone it is currently in, and if it's NOT UTC, do the necessary conversion to make it so. (Note, I'm in EST).

Then, when performing computations on the data, should I extract (say it's UTC) and get into MY time zone (EST), so it makes sense when I'm looking at it? I should I keep it in UTC and do all my calculations?

A lot of this data is time series and will be graphed, and the graph will be in EST.

This is a Python project, so lets say I have a data structure that is:

"id1": {
    "interval": 60,                            <-- seconds, subDict['interval']
    "last": "2013-01-29 02:11:11.151996+00:00" <-- UTC, subDict['last']
},

And I need to operate on this, by determine if the current time (now()) is > the last + interval (has the 60 second elapsed)? So in code:

lastTime = dateutil.parser.parse(subDict['last'])    
utcNow = datetime.datetime.utcnow().replace(tzinfo=tz.tzutc())

if lastTime + datetime.timedelta(seconds=subDict['interval']) < utcNow:
    print "Time elapsed, do something!"

Does that make sense? I'm working with UTC everywhere, both stored and computationally...

Also, if anyone has links to good write-ups on how to work with timestamps in software, I'd love to read it. Possibly like a Joel On Software for timestamp usage in applications ?


Solution

  • It seems to me as though you're already doing things 'the right way'. Users will probably expect to interact in their local time zone (input and output), but it's normal to store normalized dates in UTC format so that they are unambiguous and to simplify calculation. So, normalize to UTC as soon as possible, and localize as late as possible.

    Some small amount of information about Python and timezone processing can be found here:

    My current preference is to store dates as unix timestamp tv_sec values in backend storage, and convert to Python datetime.datetime objects during processing. Processing will usually be done with a datetime object in the UTC timezone and then converted to a local user's timezone just before output. I find having that having a rich object such as a datetime.datetime helps with debugging.

    Timezone are a nuisance to deal with and you probably need to determine on a case-by-case basis whether it's worth the effort to support timezones correctly.

    For example, let's say you're calculating daily counts for bandwidth used. Some questions that may arise are:

    1. What happens on a daylight saving boundary? Should you just assume that a day is always 24 hours for ease of calculation or do you need to always check for every daily calculation that a day may have less or more hours on the daylight savings boundary?
    2. When presenting a localized time, does it matter if a time is repeated? eg. If you have an hourly report display in localtime without a time zone attached, will it confuse the user to have a missing hour of data, or a repeated hour of data around daylight savings changes.