Integer class wrapper performance

I am looking to remodel an existing library for fixed point numbers. Currently the library is just namespaced functions operating on 32-bit signed integers. I would like to turn this around and create a fixed point class that wraps an integer, but don't want to pay any performance penalty associated with classes for something this fine-grained, as performance is an issue for the use case.

Since the prospective class has such simple data requirements, and no resources, I thought it might be possible to make the class "value oriented", leveraging non-modifying operations and passing instances by value where reasonable. This will be a simple class if implemented, not part of a hierarchy.

I am wondering if it is possible to write an integer wrapper class in such a way that no real performance penalty is incurred compared to using raw integers. I am almost confident that this is the case, but don't know enough about the compilation process to just jump into it.

I know that it's said that stl iterators are compiled to simple pointer operations, and would like to do something similar only with integer operations.

The library will be updated to c++11 as a part of a project anyway, so I'm hoping that at least with constexpr and other new features like rvalue references, I can push the performance of this class to near that of pure integer operations.

Additionally, any recommendations for benchmarking performance differences between the two implementations would be appreciated.

Solution

What's amusing with this question is that it's just so compiler dependent. Using Clang/LLVM:

#include <iostream>
using namespace std;

inline int foo(int a) { return a << 1; }

struct Bar
{
    int a;

    Bar(int x) : a(x) {}

    Bar baz() { return a << 1; }
};

void out(int x) __attribute__ ((noinline));
void out(int x) { cout << x; }

void out(Bar x) __attribute__ ((noinline));
void out(Bar x) { cout << x.a; }

void f1(int x) __attribute ((noinline));
void f1(int x) { out(foo(x)); }

void f2(Bar b) __attribute ((noinline));
void f2(Bar b) { out(b.baz()); }

int main(int argc, char** argv)
{
    f1(argc);
    f2(argc);
}

Gives the following IR:

define void @_Z3outi(i32 %x) uwtable noinline {
  %1 = tail call %"class.std::basic_ostream"*
                 @_ZNSolsEi(%"class.std::basic_ostream"* @_ZSt4cout, i32 %x)
  ret void
}

define void @_Z3out3Bar(i32 %x.coerce) uwtable noinline {
  %1 = tail call %"class.std::basic_ostream"*
                 @_ZNSolsEi(%"class.std::basic_ostream"* @_ZSt4cout, i32 %x.coerce)
  ret void
}

define void @_Z2f1i(i32 %x) uwtable noinline {
  %1 = shl i32 %x, 1
  tail call void @_Z3outi(i32 %1)
  ret void
}

define void @_Z2f23Bar(i32 %b.coerce) uwtable noinline {
  %1 = shl i32 %b.coerce, 1
  tail call void @_Z3out3Bar(i32 %1)
  ret void
}

And unsurprisingly, the generated assembly is just identical:

    .globl  _Z2f1i
    .align  16, 0x90
    .type   _Z2f1i,@function
_Z2f1i:                                 # @_Z2f1i
.Ltmp6:
    .cfi_startproc
# BB#0:
    addl    %edi, %edi
    jmp _Z3outi                 # TAILCALL
.Ltmp7:
    .size   _Z2f1i, .Ltmp7-_Z2f1i
.Ltmp8:
    .cfi_endproc
.Leh_func_end2:


    .globl  _Z2f23Bar
    .align  16, 0x90
    .type   _Z2f23Bar,@function
_Z2f23Bar:                              # @_Z2f23Bar
.Ltmp9:
    .cfi_startproc
# BB#0:
    addl    %edi, %edi
    jmp _Z3out3Bar              # TAILCALL
.Ltmp10:
    .size   _Z2f23Bar, .Ltmp10-_Z2f23Bar
.Ltmp11:
    .cfi_endproc
.Leh_func_end3:

Normally, as long as the methods on the class are inlined, the this parameter and references can be omitted easily. I don't quite see how gcc could mess this up.