Scalar `new T` vs array `new T[1]`

We recently discovery that some code was using new T[1] systematically (properly matched to delete[]), and I'm wondering if this is harmless, or there are some downsides in the generated code (in space or time/performance). Of course, this was hidden behind layers of functions and macros, but that's beside the point.

Logically, it appears to me that both are similar, but are they?

Are compilers allowed to turn this code (using a literal 1, not a variable, but via functions layers, that 1 turns into an argument variable 2 or 3 times before reaching the code using thus new T[n]) into a scalar new T?

Any other considerations/things to know about the difference between these two?

Solution

If T doesn't have trivial destructor, then for usual compiler implementations, new T[1] has an overhead compared to new T. The array version will allocate a little bit larger memory area, to store the number of elements, so at delete[], it knows how many destructors must be called.

So, it has an overhead:

a little bit larger memory area must be allocated
delete[] will be a little bit slower, as it needs a loop to call the destructors, instead calling a simple destructor (here, the difference is the loop overhead)

Check out this program:

#include <cstddef>
#include <iostream>

enum Tag { tag };

char buffer[128];

void *operator new(size_t size, Tag) {
    std::cout<<"single: "<<size<<"\n";
    return buffer;
}
void *operator new[](size_t size, Tag) {
    std::cout<<"array: "<<size<<"\n";
    return buffer;
}

struct A {
    int value;
};

struct B {
    int value;

    ~B() {}
};

int main() {
    new(tag) A;
    new(tag) A[1];
    new(tag) B;
    new(tag) B[1];
}

On my machine, it prints:

single: 4
array: 4
single: 4
array: 12

Because B has a non-trivial destructor, the compiler allocates extra 8 bytes to store the number of elements (because it is 64-bit compilation, it needs 8 extra bytes to do this) for the array version. As A does trivial destructor, the array version of A doesn't need this extra space.

Note: as Deduplicator comments, there is a slight performance advantage of using the array version, if the destructor is virtual: at delete[], the compiler doesn't have to call the destructor virtually, because it knows that the type is T. Here's a simple case to demonstrate this:

struct Foo {
    virtual ~Foo() { }
};

void fn_single(Foo *f) {
    delete f;
}

void fn_array(Foo *f) {
    delete[] f;
}

Clang optimizes this case, but GCC doesn't: godbolt.

For fn_single, clang emits a nullptr check, then calls the destructor+operator delete function virtually. It must do this way, as f can point to a derived type, which has a non-empty destructor.

For fn_array, clang emits a nullptr check, and then calls straight to operator delete, without calling the destructor, as it is empty. Here, the compiler knows that f actually points to an array of Foo objects, it cannot be a derived type, hence it can omit the calls to empty destructors.