Why [[no_unique_address]] attribute doesn't work in some cases?

I'm playing around with [[no_unique_address]] attribute introduced in C++20. As far as I understood from cppreference article and dcl.attr.nouniqueaddr chapter of the Standard, this attribute indicates that the field need not have an address distinct from all other non-static data members of the class. Therefore the compiler can optimize the memory layout of the struct. But there is one thing that confuses me.

Consider the following example (https://godbolt.org/z/fj6nGebcs):

struct Empty {};
struct Test {
  [[no_unique_address]] Empty em1;
  char f1;
  int f2;
  [[no_unique_address]] Empty em2;
};

The field em1 has a zero size and will be located at the same address as f1. It seems logical to me that the same optimization could be applied to em2 and it will have the same address as f2 and zero size. But it doesn't work that way. After some experimentation, I can say that if a field with [[no_unique_address]] is defined at the end of the struct, this optimization won't work at all. So compiled with the latest versions of gcc and clang, the size of the struct Test will be 12 (1-byte char field, 3 bytes alignment, 4-byte int field, and struct Empty field with size 1 byte and 3 bytes alignment).

Now let's move the em2 up (https://godbolt.org/z/Goz9Tj66K):

struct Empty {};
struct Test {
  [[no_unique_address]] Empty em1;
  char f1;
  [[no_unique_address]] Empty em2;
  int f2;
};

After this change, the size of the struct Test will be 8, and em2 field will have the same address as f2.

I don't understand why it works this way. I found the tests for the size of the same struct in clang. And the latest gcc works in the same way.

But I'm not sure, why compilers can't apply [[no_unique_address]] optimization in the first example above like in the second one. Are there any technical limitations? Or is it related to the internal implementation of compilers?

Solution

The behavior of [[no_unique_address]] is always at the discretion of the compiler; it is never required to do anything. A compiler can ignore it in all cases, respect it in some and ignore it in others, or respect it all the time. So long as the compiler is not doing it at random (ie: it's consistent), it can do whatever it wants (within the other rules of layout).

So why does it work in one case and not the other? Because that's how the compiler vendor implemented it.

If you want to take the best advantage of it, make things easiest for the compiler to handle. Don't use the same empty type multiple times (this limits what compilers can do). Declare all of the empty fields first.

Alternatively, you'll have to track down the ABI rules your compiler is using. The Itanium ABI used on Linux has particular provisions for no_unique_address.