Can storing unrelated data in the least-significant-bit of a pointer work reliably?

Let me just say up front that what I'm aware that what I'm about to propose is a mortal sin, and that I will probably burn in Programming Hell for even considering it.

That said, I'm still interested in knowing if there's any reason why this wouldn't work.

The situation is: I have a reference-counting smart-pointer class that I use everywhere. It currently looks something like this (note: incomplete/simplified pseudocode):

class IRefCountable
{
public:
    IRefCountable() : _refCount(0) {}
    virtual ~IRefCountable() {}

    void Ref() {_refCount++;}
    bool Unref() {return (--_refCount==0);}

private:
    unsigned int _refCount;
};

class Ref
{
public:
   Ref(IRefCountable * ptr, bool isObjectOnHeap) : _ptr(ptr), _isObjectOnHeap(isObjectOnHeap) 
   { 
      _ptr->Ref();
   }

   ~Ref() 
   {
      if ((_ptr->Unref())&&(_isObjectOnHeap)) delete _ptr;
   }

private:
   IRefCountable * _ptr;
   bool _isObjectOnHeap;
};

Today I noticed that sizeof(Ref)=16. However, if I remove the boolean member variable _isObjectOnHeap, sizeof(Ref) is reduced to 8. That means that for every Ref in my program, there are 7.875 wasted bytes of RAM... and there are many, many Refs in my program.

Well, that seems like a waste of some RAM. But I really need that extra bit of information (okay, humor me and assume for the sake of the discussion that I really do). And I notice that since IRefCountable is a non-POD class, it will (presumably) always be allocated on a word-aligned memory address. Therefore, the least significant bit of (_ptr) should always be zero.

Which makes me wonder... is there any reason why I can't OR my one bit of boolean data into the least-significant bit of the pointer, and thus reduce sizeof(Ref) by half without sacrificing any functionality? I'd have to be careful to AND out that bit before dereferencing the pointer, of course, which would make pointer dereferences less efficient, but that might be made up for by the fact that the Refs are now smaller, and thus more of them can fit into the processor's cache at once, and so on.

Is this a reasonable thing to do? Or am I setting myself up for a world of hurt? And if the latter, how exactly would that hurt be visited upon me? (Note that this is code that needs to run correctly in all reasonably modern desktop environments, but it doesn't need to run in embedded machines or supercomputers or anything exotic like that)

Solution

Any reason? Unless things have changed in the standard lately, the value representation of a pointer is implementation-defined. It is certainly possible that some implementation somewhere may pull the same trick, defining these otherwise-unused low bits for its own purposes. It's even more possible that some implementation might use word-pointers rather than byte-pointers, so instead of two adjacent words being at "addresses" 0x8640 and 0x8642, they would be at "addresses" 0x4320 and 0x4321.

One tricky way around the problem would be to make Ref a (de facto) abstract class, and all instances would actually be instances of RefOnHeap and RefNotOnHeap. If there are that many Refs around, the extra space used to store the code and metadata for three classes rather than one would be made up by the space savings in having each Ref being half the size. (Won't work too well, the compiler can omit the vtable pointer if there are no virtual methods and introducing virtual methods will add the 4-or-8 bytes back to the class).