c++undefined-behavior integer-overflow c++-chrono

How can I actually use std::chrono types without risking overflow and undefined behavior?

I've used std::chrono for years and have watched many of Howard Hinnant's talks on the design and use of the library. I like it, and I think I generally understand it. However, recently, I suddenly realized that I didn't know how to use it practically and safely to avoid undefined behaviour.

Please bear with me while I walk through a few cases to set the stage for my question.

Let's start with what I think is the "simplest" std::chrono::duration type, nanoseconds. Its minimum rep size is 64-bit, which means in practice it will be std::int64_t and there are thus probably no "leftover" optional representational bits that aren't required to be there by the standard.

This function is obviously not always safe:

nanoseconds f1(nanoseconds value)
{ return ++value; }

If value is nanoseconds::max(), then this overflows, which we can confirm with clang 7's UBSan (-fsanitize=undefined):

runtime error: signed integer overflow: 9223372036854775807 + 1 cannot be
represented in type 'std::__1::chrono::duration<long long,
std::__1::ratio<1, 1000000000> >::rep' (aka 'long long')

But this is nothing special. It's no different than the typical integer case:

std::int64_t f2(std::int64_t value)
{ return ++value; }

When we can't be sure that value isn't already its maximum, we check first, and handle the error however we deem appropriate. For example:

nanoseconds f3(nanoseconds value)
{
  if(value == value.max())
  {
    throw std::overflow_error{"f3"};
  }
  return ++value;
}

If we have an existing (unknown) nanoseconds value that we want to add another (unknown) nanoseconds value to, the naive approach is:

struct Foo
{
  // Pretend this can be set in other meaningful ways so we
  // don't know what it is.
  nanoseconds m_nanos = nanoseconds::max();

  nanoseconds f4(nanoseconds value)
  { return m_nanos + value; }
};

And again, we'll get into trouble:

runtime error: signed integer overflow: 9223372036854775807 +
9223372036854775807 cannot be represented in type 'long long'
Foo{}.f4(nanoseconds::max()) = -2 ns

So, again, we can do as we would with the integers, but it's already getting trickier because these are signed integers:

struct Foo
{
  explicit Foo(nanoseconds nanos = nanoseconds::max())
    : m_nanos{nanos}
  {}
  // Again, pretend this can be set in other ways, so we don't
  // know what it is.
  nanoseconds m_nanos;

  nanoseconds f5(nanoseconds value)
  {
    if(m_nanos > m_nanos.zero() && value > m_nanos.max() - m_nanos)
    {
      throw std::overflow_error{"f5+"};
    }
    else if(m_nanos < m_nanos.zero() && value < m_nanos.min() - m_nanos)
    {
      throw std::overflow_error{"f5-"};
    }
    return m_nanos + value;
  }
};

Foo{}.f5(0ns) = 9223372036854775807 ns
Foo{}.f5(nanoseconds::min()) = -1 ns
Foo{}.f5(1ns) threw std::overflow_error: f5+
Foo{}.f5(nanoseconds::max()) threw std::overflow_error: f5+
Foo{nanoseconds::min()}.f5(0ns) = -9223372036854775808 ns
Foo{nanoseconds::min()}.f5(nanoseconds::max()) = -1 ns
Foo{nanoseconds::min()}.f5(-1ns) threw std::overflow_error: f5-
Foo{nanoseconds::min()}.f5(nanoseconds::min()) threw std::overflow_error: f5-

I think I got that right. It's starting to get harder to be certain if the code is correct.

So far, things might seem manageable, but what about this case?

nanoseconds f6(hours value)
{ return m_nanos + value; }

We have the same problem as we did with f4(). Can we solve it the same way as f5() did? Let's use the same body as f5(), but just change the argument type, and see what happens:

nanoseconds f7(hours value)
{
  if(m_nanos > m_nanos.zero() && value > m_nanos.max() - m_nanos)
  {
    throw std::overflow_error{"f7+"};
  }
  else if(m_nanos < m_nanos.zero() && value < m_nanos.min() - m_nanos)
  {
    throw std::overflow_error{"f7-"};
  }
  return m_nanos + value;
}

This seems sane because we're still checking if there's room between nanoseconds::max() and m_nanos to add in value. So what happens when we run this?

Foo{}.f7(0h) = 9223372036854775807 ns
/usr/lib/llvm-7/bin/../include/c++/v1/chrono:880:59: runtime error: signed
integer overflow: -9223372036854775808 * 3600000000000 cannot be represented
in type 'long long'
Foo{}.f7(hours::min()) = 9223372036854775807 ns
Foo{}.f7(1h) threw std::overflow_error: f7+
Foo{}.f7(hours::max()) DIDN'T THROW!!!!!!!!!!!!!!
Foo{nanoseconds::min()}.f7(0h) = -9223372036854775808 ns
terminating with uncaught exception of type std::overflow_error: f7-
Aborted

Oh my. That definitely didn't work.

In my test driver, the UBSan error is printed above the call that it is reporting on, so the first failure is Foo{}.f7(hours::min()). But that case shouldn't even throw, so why did it fail?

The answer is that even the act of comparing hours to nanoseconds involves a conversion. This is because the comparison operators are implemented via the use of std::common_type, which std::chrono defines for duration types in terms of the greatest common divisor of the period values. In our case, that's nanoseconds, so first, the hours is converted to nanoseconds. A snippet from libc++ shows part of this:

template <class _LhsDuration, class _RhsDuration>
struct __duration_lt
{
  _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR
  bool operator()(const _LhsDuration& __lhs, const _RhsDuration& __rhs) const
  {
    typedef typename common_type<_LhsDuration, _RhsDuration>::type _Ct;
    return _Ct(__lhs).count() < _Ct(__rhs).count();
  }
};

Since we didn't check that our hours value was small enough to fit in nanoseconds (on this particular standard library implementation, with its particular rep type choices), the following are essentially equivalent:

if(m_nanos > m_nanos.zero() && value > m_nanos.max() - m_nanos)

if(m_nanos > m_nanos.zero() && nanoseconds{value} > m_nanos.max() - m_nanos)

As an aside, the same problem will exist if hours uses a 32-bit rep:

runtime error: signed integer overflow: 2147483647 * 3600000000000 cannot be
represented in type 'long long'

Of course, if we make the value small enough, including by limiting the rep size, we can end up making it fit . . . since obviously some hours values can be represented as nanoseconds or conversions would be pointless.

Let's not give up yet. Conversions are another important case anyway, so we should know how to handle them safely. Surely that can't be too hard.

The first hurdle is that we need to know if we can even get from hours to nanoseconds without overflowing the nanoseconds::rep type. Again, do as we would with integers and do a multiplication overflow check. For the moment, let's ignore negative values. We could do this:

nanoseconds f8(hours value)
{
  assert(value >= value.zero());
  if(value.count()
     > std::numeric_limits<nanoseconds::rep>::max() / 3600000000000)
  {
    throw std::overflow_error{"f8+"};
  }
  return value;
}

It seems to work if we test it against the limits of our standard library's choice of nanoseconds::rep:

f8(0h) = 0 ns
f8(1h) = 3600000000000 ns
f8(2562047h) = 9223369200000000000 ns
f8(2562048h) threw std::overflow_error: f8+
f8(hours::max()) threw std::overflow_error: f8+

But, there are some pretty serious limitations. First, we had to "know" how to convert between hours and nanoseconds, which kind of defeats the point. Secondly, this only handles these very particular two types with a very nice relationship between their period types (where only a single multiplication is needed).

Imagine we wanted to implement an overflow-safe conversion of just the standard named duration types, supporting only the lossless conversions:

template <typename target_duration, typename source_duration>
target_duration lossless(source_duration duration)
{
  // ... ?
}

It seems like we need to compute relationships between ratios and make decisions and check the multiplications based on that . . . and once we've done that, we've had to understand and re-implement all of the logic in the duration operators (but now with overflow safety) that we originally set out to use in the first place! We can't really need to implement the type just to use the type, can we?

Plus, when we're done, we just have some function, lossless(), that performs a conversion if we explicitly call it instead allowing natural implicit conversions, or some other function that adds a value if we explicitly call it instead of using operator+(), so we've lost the expressiveness that is a huge part of the value of duration.

Add into the mix lossy conversions with duration_cast and it seems hopeless.

I'm not even sure how I would approach dealing with something as simple as this:

template <typename duration1, typename duration2>
bool isSafe(duration1 limit, duration2 reading)
{
  assert(limit >= limit.zero());
  return reading < limit / 2;
}

Or, worse, this, even if I knew some things about grace:

template <typename duration1, typename duration2>
bool isSafe2(duration1 limit, duration2 reading, milliseconds grace)
{
  assert(limit >= limit.zero());
  assert(grace >= grace.zero());
  const auto test = limit / 2;
  return grace < test && reading < (test - grace);
}

If duration1 and duration2 can really be any duration type (including things like std::chrono::duration<std::int16_t, std::ratio<3, 7>>, I can't see a way to proceed with confidence. But even if we limit ourselves to "normal" duration types, there are a lot of scary outcomes.

In some ways this situation is no "worse" than dealing with normal fixed-size integers, like everyone does every day, where you often "ignore" the possibility of overflow because you "know" the domain of values you're working with. But, surprisingly to me, these types of solutions seem "worse" with std::chrono than they do with normal integers because as soon as you try to be safe with respect to overflow, you end up defeating the benefits of using std::chrono in the first place.

If I make my own duration types based on an unsigned rep, I guess I technically avoid at least some of the undefined behaviour from an integer overflow point of view, but I still can easily get garbage results from "careless" calculations. The "problem space" seems to be the same.

I'm not interested in a solution based on floating point types. I'm using std::chrono to keep the precise resolutions I choose in each case. If I didn't care about being precise or rounding errors, I could easily just use double to count seconds everywhere and not mix units. But if that was a viable solution for every problem, we wouldn't have std::chrono (or even struct timespec, for that matter).

So my question is, how do I safely and practically use std::chrono to do something as simple as add two values together of different durations without fear of undefined behaviour because of integer overflow? Or do lossless conversions safely? I haven't figured out a practical solution even with known simple duration types, let alone the rich universe of all possible duration types. What am I missing?

Solution

The highest performing answer is to know your domain, and don't program anywhere near the maximum range of the precision you're using. If you're using nanoseconds, the range is +/- 292 years. Don't go near that far. If you need more range than just +/- 100 years, use a coarser resolution than nanoseconds.

If you can follow those rules, then you can just not worry about overflow.

Sometimes you can't. For example if your code must handle untrusted input, or general input (e.g. a general purpose library), then you really do need to check for overflow.

One technique is to choose a rep just for comparison that can handle more range than anyone needs, just for the comparison. int128_t and double are two tools I reach for in this case. For example, here is a checked_convert that checks for overflow using double prior to actually performing the duration_cast:

template <class Duration, class Rep, class Period>
Duration
checked_convert(std::chrono::duration<Rep, Period> d)
{
    using namespace std::chrono;
    using S = duration<double, typename Duration::period>;
    constexpr S m = Duration::min();
    constexpr S M = Duration::max();
    S s = d;
    if (s < m || s > M)
        throw std::overflow_error("checked_convert");
    return duration_cast<Duration>(d);
}

It is significantly more expensive. But if you are writing (for example) std::thread::sleep_for, it is worth the expense.

If for some reason you can't using floating point even for checks, I have experimented with lcm_type (not a great name). This is the opposite of common_type_t<Duration1, Duration2>. Instead of finding the duration that both input durations can convert to without loss (without a division), it finds the duration that both input durations can convert to without a multiplication. For example lcm_type_t<milliseconds, nanoseconds> has type milliseconds. Such a conversion can not overflow.

template <class Duration0, class ...Durations>
struct lcm_type;

template <class Duration>
struct lcm_type<Duration>
{
    using type = Duration;
};

template <class Duration1, class Duration2>
struct lcm_type<Duration1, Duration2>
{
    template <class D>
    using invert = std::chrono::duration
                   <
                       typename D::rep,
                       std::ratio_divide<std::ratio<1>, typename D::period>
                   >;

    using type = invert<typename std::common_type<invert<Duration1>,
                                                  invert<Duration2>>::type>;
};

template <class Duration0, class Duration1, class Duration2, class ...Durations>
struct lcm_type<Duration0, Duration1, Duration2, Durations...>
{
    using type = typename lcm_type<
                     typename lcm_type<Duration0, Duration1>::type,
                     Duration2, Durations...>::type;
};

template <class ...T>
using lcm_type_t = typename lcm_type<T...>::type;

You can convert both input durations to lcm_type_t<Duration1, Duration2>, without fear of overflow, then do the comparison.

The problem with this technique is that it is not precise. Two slightly different durations might convert to the lcm_type_t and because of truncation losses, compare equal. For this reason I prefer the solution with double, but it is good to have lcm_type in your toolbox too.