I'm trying to determine (as my application is dealing with lots of data from different sources and different time zones, formats, etc) how best to store my data AND work with it.
For example, should I store everything as UTC? This means when I fetch data I need to determine what timezone it is currently in, and if it's NOT UTC, do the necessary conversion to make it so. (Note, I'm in EST).
Then, when performing computations on the data, should I extract (say it's UTC) and get into MY time zone (EST), so it makes sense when I'm looking at it? I should I keep it in UTC and do all my calculations?
A lot of this data is time series and will be graphed, and the graph will be in EST.
This is a Python project, so lets say I have a data structure that is:
"id1": {
"interval": 60, <-- seconds, subDict['interval']
"last": "2013-01-29 02:11:11.151996+00:00" <-- UTC, subDict['last']
},
And I need to operate on this, by determine if the current time (now()) is > the last + interval (has the 60 second elapsed)? So in code:
lastTime = dateutil.parser.parse(subDict['last'])
utcNow = datetime.datetime.utcnow().replace(tzinfo=tz.tzutc())
if lastTime + datetime.timedelta(seconds=subDict['interval']) < utcNow:
print "Time elapsed, do something!"
Does that make sense? I'm working with UTC everywhere, both stored and computationally...
Also, if anyone has links to good write-ups on how to work with timestamps in software, I'd love to read it. Possibly like a Joel On Software for timestamp usage in applications ?
It seems to me as though you're already doing things 'the right way'. Users will probably expect to interact in their local time zone (input and output), but it's normal to store normalized dates in UTC format so that they are unambiguous and to simplify calculation. So, normalize to UTC as soon as possible, and localize as late as possible.
Some small amount of information about Python and timezone processing can be found here:
My current preference is to store dates as unix timestamp tv_sec
values in backend storage, and convert to Python datetime.datetime
objects during processing. Processing will usually be done with a datetime
object in the UTC timezone and then converted to a local user's timezone just before output. I find having that having a rich object such as a datetime.datetime
helps with debugging.
Timezone are a nuisance to deal with and you probably need to determine on a case-by-case basis whether it's worth the effort to support timezones correctly.
For example, let's say you're calculating daily counts for bandwidth used. Some questions that may arise are: