Search code examples
c++c++11non-unicode

libc++ vs VC++: Can non-UTF conversions be done with wstring_convert?


The C++11's std::wstring_convert works great* for the standard UTF-8 <-> UTF-16/UCS2/UCS4 conversions. However, when I attempted to instantiate a wstring_convert or wbuffer_convert with a facet not from <codecvt>, it didn't work as expected:

// works as expected
std::wstring_convert<std::codecvt_utf8<wchar_t>> ucs4conv;

// Now, by analogy, I want to try this:
std::wstring_convert<std::codecvt<wchar_t, char, std::mbstate_t>> gbconv(
        new std::codecvt_byname<wchar_t, char, std::mbstate_t>("zh_CN.gb18030"));

Clang++ errors out saying "calling a protected destructor of codecvt<> in ~wstring_convert"

Visual Studio allows it (although it lacks that locale, but that's another story), because its wstring_convert pawns the lifetime management of the facet pointer off to a locale object it holds as a member, and locales know how to delete pointers to all facets.

Is Visual Studio right and libc++ wrong?

* as implemented in clang++-2.9/libc++-svn and Visual Studio 2010 EE SP1, the following example works on both, but not in GCC, sadly: https://ideone.com/hywz6


Solution

  • I am admittedly biased in this answer. But I will attempt to back up my claims with references to N3290 (which is unfortunately no longer publicly available). And I will also offer a solution.

    Analysis:

    The synopsis of wstring_convert in [conversions.string]/p2 includes:

    private:
      byte_string byte_err_string;  // exposition only
      wide_string wide_err_string;  // exposition only
      Codecvt *cvtptr;              // exposition only
      state_type cvtstate;          // exposition only
      size_t cvtcount;              // exposition only
    

    The "exposition only" means that the wstring_convert doesn't have to have these members in this order by this spelling. But "exposition only" members are used to describe the effects of various members, and those specifications are binding.

    And so the question appears to become:

    What is the specification of ~wstring_convert()?

    This is found in p17 of the same section ([conversions.string]):

    ~wstring_convert();

    Effects: The destructor shall delete cvtptr.

    That implies to me that ~Codecvt() must be accessible, and therefore libc++ is following the C++11 specification.

    I would also agree that this is a royal pain in the butt.

    Solution:

    Having all of the C++98/03 facets have protected destructors has turned out to be very inconvenient. Here's an adaptor that can take any facet and give it a public destructor:

    template <class Facet>
    class usable_facet
        : public Facet
    {
    public:
        template <class ...Args>
            usable_facet(Args&& ...args)
                : Facet(std::forward<Args>(args)...) {}
        ~usable_facet() {}
    };
    

    You can now use this general purpose adaptor in your code:

    typedef usable_facet<std::codecvt<wchar_t, char, std::mbstate_t>> C;
    std::wstring_convert<C> gbconv(new C("zh_CN.gb18030"));
    

    Hope this helps.