Search code examples
c++streamostreammanipulators

Custom stream manipulators


I have a custom stream CFileManagerOStream that inherits from std::ostream, it takes Unicode UTF16 or UTF32 strings from a network stream class CTcpStream and stores them on disk as UTF8 strings. The strings are potentially way too large (could be multiple terrabytes) to be converted to UTF8 in-memory so I think I need to implement a C++ stream manipulator for this job. All the manipulator examples I have found take the entire string and processes it, which will not do in my case because of the low memory requirements. I have all the Unicode conversion code ready, the problem I'm trying to solve is doing the conversion with as little internal memory as possible.

I was hoping to use the manipulators like this;

CFileManagerOStream outFile("MultipleUtf8Strings.dat"); // Custom std::ostream
...
#ifdef _WINDOWS
CTcpStreamUtf16 largeBlobUtf16Stream;
...
outFile << ToUTF8FromUtf16 << largeBlobUtf16Stream;
#else
CTcpStreamUtf32 largeBlobUtf32Stream;
...
outFile << ToUTF8FromUtf32 << largeBlobUtf32Stream;
#endif

Is this possible or am I approaching this wrongly?


Solution

  • I found out that using std::ios_base::iword to store the requested character encoding was the best solution for the problem at hand:

    #include <iostream>
    
    /*!
    \brief Unicode encoding
    */
    enum EUnicodeEnc
    {
        /** UTF-8 character encoding */
        EUnicodeEnc_UTF8 = 1,
    
        /** UTF-16 character encoding */
        EUnicodeEnc_UTF16 = 2,
    
        /** UTF-32 character encoding */
        EUnicodeEnc_UTF32 = 3
    };
    
    /** Allocate the \c std::ios_base::iword storage for use with \c SourceStreamEncoding object instances */
    int SourceStreamEncoding::sourceEnc_xalloc = std::ios_base::xalloc();
    
    /*!
    \brief Stream I/O manipulator changes the source character encoding to UTF-8
    */
    std::ios_base& FromUtf8(std::ios_base& os) {
        os.iword(SourceStreamEncoding::sourceEnc_xalloc) = EUnicodeEnc_UTF8;
        return os;
    }
    
    /*!
    \brief Stream I/O manipulator changes the source character encoding to UTF-16
    */
    std::ios_base& FromUtf16(std::ios_base& os) {
        os.iword(SourceStreamEncoding::sourceEnc_xalloc) = EUnicodeEnc_UTF16;
        return os;
    }
    
    /*!
    \brief Stream I/O manipulator changes the source character encoding to UTF-32
    */
    std::ios_base& FromUtf32(std::ios_base& os) {
        os.iword(SourceStreamEncoding::sourceEnc_xalloc) = EUnicodeEnc_UTF32;
        return os;
    }
    
    /*!
    \brief Overrides \c std::ostream::flush()
    \details Converts the buffer to the correct character encoding then flushes buffer
    after writing its content to a storage device
    */
    std::ostream &CFileManagerOStream::flush()
    {
        switch (os.iword(SourceStreamEncoding::sourceEnc_xalloc))
        {
            case EUnicodeEnc_UTF8:
                characterEncoder.FromUTF8(...);
            break;
            case EUnicodeEnc_UTF16:
                characterEncoder.FromUTF16(...);
            break;
            case EUnicodeEnc_UTF32:
                characterEncoder.FromUTF32(...);
            break;
        }
        return (*this);
    }
    
    // Now I can do as follows:
    int main()
    {
        CFileManagerOStream outFile("MultipleUtf8Strings.dat"); // Custom std::ostream
        ...
    #ifdef _WINDOWS
        CTcpStreamUtf16 largeBlobUtf16Stream;
        ...
        outFile << FromUtf16 << largeBlobUtf16Stream;
    #else
        CTcpStreamUtf32 largeBlobUtf32Stream;
        ...
        outFile << FromUtf32 << largeBlobUtf32Stream;
    #endif
    }
    

    Additionally, I've added the following manipulator that takes a single paramater:

    class FromEnc
    {
        public:
            explicit FromEnc(int i) : i_(i) {}
            int i_;
        private:
            template <class charT, class Traits>
            friend std::basic_ostream<charT, Traits>& operator<<(std::basic_ostream<charT, Traits>& os, const FromEnc& w) {
                os.iword(SourceStreamEncoding::sourceEnc_xalloc) = w.i_;
                return os;
            }
    };
    

    , so now I can also do as follows:

    outFile << FromEnc(EUnicodeEnc_UTF16) << largeBlobUtf16Stream;