Search code examples
sqldatabasetriggersbackenddatabase-performance

Who is responsible for modifying timestamp fields in the app: backend or db?


For example: we have 2 fields in certain entity: created_at and updated_at. We can update those fields manually on backend after create or update operations, or create a trigger on the DB side that will fill/update these fields for us automatically.

There are some cases to consider:

  1. Usually, on the backend after create or update we return the json of the object. In this case it'd be nice to see those timestamp fields set up on return, however if a trigger makes the modifying for us, to see these updated timestamps, backend would have to make another select just to set up these timestamps to nicely return it to the client.
  2. Sometimes backend engineers can forget to update these fields manually leading to null records.
  3. Not a DBA specialist myself but what do you think of the cost of the triggers? Especially in high RPS. Should I not worry about the performance that triggers have for such simple updates in the high-load systems?

Solution

  • Who is responsible for modifying timestamp fields in the app: backend or db?

    It depends, it can be either.

    There is no one "right" answer for which times to use or exclude. Depending on your system, which actors perform time-based actions (users, devices, servers, triggers), any (or all) of the list below might make sense to incorporate.

    Depending on your system, you might have one (or more) of the following:

    • time A – when a user performs an action

      • this is most likely local device time (whatever the phone or computer thinks is current time)
      • but: anything is possible, a client could get a time from who-knows-where and report that to you
      • could be when a user did something (tap a button) and not when the message was sent to the backend
      • could be 10-20 seconds (or more) after a user did something (tap a button), and gets assigned by the device when it sends out batched data
    • time B – when the backend gets involved

      • this is server time, and could be when the server receives the data, or after the server has received and processed the data, and is about to hand it off to the next player (database, another server, etc)
      • note: this is probably different from "time A" due to transit time between user and backend
      • also, there's no guarantee that different servers in the mix all agree on time.. they can and should, but should not be relied upon as truth
    • time C – when a value is stored in the database

      • this is different from server time (B)
      • a server might receive inbound data at B, then do some processing which takes time, then finally submits an insert to the database (which then assigns time C)

    Another highly relevant consideration in capturing time is the accuracy (or rather, the likely inaccuracy) of client-reported time. For example, a mobile device can claim to have sent a message at time X, when in fact the clock is just set incorrectly and actual time is minutes, hours - even days - away from reported time X (in the future or in the past). I've seen this kind of thing occur where data arrives in a system, claiming to be from months ago, but we can prove from other telemetry that it did in fact arrive recently (today or yesterday). Never trust device-reported times. This applies to a device – mobile, tablet, laptop, desktop – all of them often have internal clocks that are not accurate.

    Remote servers and your database are probably closer to real, though they can be wrong in various ways. However, even if wrong, when the database auto-assigns datetimes to two different rows, you can trust that one of them really did arrive after or before the other – the time might be inaccurate relative to actual time, but they're accurate relative to each other.

    All of this becomes further complicated if you intend to piece together history by using timestamps from multiple origins (A, B and C). It's tempting to do, and sometimes it works out fine, but it can easily be nonsense data. For example, it might seem safe to piece together history using a user time A, then a server time B, and database time C. Surely they're all in order – A happened first, then B, then C; so clearly all of the times should be ascending in value. But these are often out of order. So if you need to piece together history for something important, it's a good idea to look for secondary confirmations of order of events, and don't rely on timestamps.

    Also on the subject of timestamps: store everything in UTC – database values, server times, client/device times were possible. Timezones are the worst.