Search code examples
c++stlcontainersproxy-classesbitvector

Could multiple proxy classes make up a STL-proof bitvector?


It's well known that std::vector<bool> does not satisfy the Standard's container requirements, mainly because the packed representation prevents T* x = &v[i] from returning a pointer to a bool.

My question is: can this be remedied/mitigated when the reference_proxy overloads the address-of operator& to return a pointer_proxy?

The pointer-proxy could contain the same data as the reference_proxy in most implementations, namely a pointer into the packed data and a mask to isolate the particular bit inside the block pointed to. Indirection of the pointer_proxy would then yield the reference_proxy. Essentially both proxies are "fat" pointers, which are, however, still rather light-weight compared to disk-based proxy containers.

Instead of T* x = &v[0] one could then do auto x = &v[0], and use x like if(*x) without problems. I would also like to be able to write for(auto b: v) { /* ... */ }

Questions: would such a multi-proxy approach work with the STL's algorithms? Or do some algorithms really rely on the requirement that x needs to be a real bool*? Or are there too many consecutive user-defined conversions required that prevent this to work? I'd like to know any of such obstructions before trying to fully complete the above implementation sketch.


UPDATE (based on @HowardHinnant 's answer and this ancient discussion on comp.std.c++)

You can come a long way to almost mimic the builtin types: for any given type T, a pair of proxies (e.g. reference_proxy and iterator_proxy) can be made mutually consistent in the sense that reference_proxy::operator&() and iterator_proxy::operator*() are each other's inverse.

However, at some point one needs to map the proxy objects back to behave like T* or T&. For iterator proxies, one can overload operator->() and access the template T's interface without reimplementing all the functionality. However, for reference proxies, you would need to overload operator.(), and that is not allowed in current C++ (although Sebastian Redl presented such a proposal on BoostCon 2013). You can make a verbose work-around like a .get() member inside the reference proxy, or implement all of T's interface inside the reference (this is what is done for vector::bit_reference), but this will either lose the builtin syntax or introduce user-defined conversions that do not have builtin semantics for type conversions (you can have at most one user-defined conversion per argument).


Solution

  • My question is: can this be remedied/mitigated when the reference_proxy overloads the address-of operator& to return a pointer_proxy?

    libc++ actually does this.

    #include <vector>
    #include <cassert>
    
    int main()
    {
        std::vector<bool> v(1);
        std::vector<bool>::pointer pb = &v[0];
        assert(*pb == false);
        *pb = true;
        assert(v[0] == true);
        std::vector<bool>::const_pointer cbp = pb;
        assert(*cbp == true);
        v[0] = false;
        assert(*cbp == false);
    }
    

    It even extends to const_pointer and const_reference in ways that mimic the same types for vector<int>. This is a non-conforming extension for libc++. But it makes writing generic code which might be instantiated on vector<bool> far more likely to compile and behave correctly.

    Questions: would such a multi-proxy approach work with the STL's algorithms? Or do some algorithms really rely on the requirement that x needs to be a real bool*? Or are there too many consecutive user-defined conversions required that prevent this to work?

    All of libc++'s algorithms work with vector<bool>. Some of them with quite spectacular performance. One algorithm in particular must have special treatment which the standard unfortunately does not mandate:

    #include <vector>
    #include <cassert>
    
    int main()
    {
        std::vector<bool> v(1);
        bool b = true;
        assert(v[0] == false);
        assert(b == true);
        std::swap(b, v[0]);
        assert(v[0] == true);
        assert(b == false);
    }
    

    This is very easy for the implementation to accomplish. One simply needs to make sure swap works for any combination of bool and vector<bool>::reference. But I don't know if any implementation besides libc++ does this, and it is not mandated by C++11.

    An array of bits is a wonderful data structure. But unfortunately it is poorly specified in the C++ standard. libc++ has gone somewhat outlaw to demonstrate that this can be a very useful and high performance data structure. The hope is that a future C++ standard may migrate in this direction to the benefit of the C++ programmer.