Search code examples
c++boostiostreamboost-iostreams

Reinterpret a narrow (char) input stream as a wide (wchar_t) stream


I'm given an std::istream that contains an UTF-16 encoded string. Imagine an UTF-16 encoded text file that has been opened like this:

std::ifstream file( "mytext_utf16.txt", std::ios::binary );

I want to pass this stream to a function that takes a std::wistream& parameter. I cannot change the file stream type to std::wifstream.

Question: Are there any facilities in the standard or boost libraries that enable me to "reinterpret" the istream as a wistream?

I'm imagining an adapter class similar to std::wbuffer_convert except that it shouldn't do any encoding conversion. Basically for each wchar_t that is read from the adapter class, it should just read two bytes from the associated istream and reinterpret_cast them to wchar_t.

I have created an implementation using boost::iostreams that can be used like this and works like a charm:

std::ifstream file( "mytext_utf16.txt", std::ios::binary );

// Create an instance of my adapter class.
reinterpret_as_wide_stream< std::ifstream > wfile( &file );

// Read a wstring from file, using the adapter.
std::wstring str;
std::get_line( wfile, str );    

Why am I asking then? Because I like to reuse existing code instead of reinventing the wheel.


Solution

  • This is work in progress

    This is nothing you should use, but probably a hint with what you can start, if you didn't thought about doing such a thing yet. If this is not helpful or when you can work out a better solution I am glad to remove or extend this answer.

    As far as I understand you want to read a UTF-8 file and simply cast each single character into wchar_t.

    If it is too much what the standard facilities do, couldn't you write your own facet.

    #include <codecvt>
    #include <locale>
    #include <fstream>
    #include <cwchar>
    #include <iostream>
    #include <fstream>
    
    class MyConvert
    {
     public:
      using state_type = std::mbstate_t;
      using result = std::codecvt_base::result;
      using From = char;
      using To = wchar_t;
      bool always_noconv() const throw() {
        return false;
      }
      result in(state_type& __state, const From* __from,
        const From* __from_end, const From*& __from_next,
        To* __to, To* __to_end, To*& __to_next) const
      {
        while (__from_next != __from_end) {
          *__to_next = static_cast<To>(*__from_next);
          ++__to_next;
          ++__from_next;
        }
        return result::ok;
      }
      result out(state_type& __state, const To* __from,
          const To* __from_end, const To*& __from_next,
          From* __to, From* __to_end, From*& __to_next) const
      {
        while (__from_next < __from_end) {
          std::cout << __from << " " << __from_next << " " << __from_end << " " << (void*)__to << 
            " " << (void*)__to_next << " " << (void*)__to_end << std::endl;
          if (__to_next >= __to_end) {
            std::cout << "partial" << std::endl;
            std::cout << "__from_next = " << __from_next << " to_next = " <<(void*) __to_next << std::endl;
            return result::partial;
          }
          To* tmp = reinterpret_cast<To*>(__to_next);
          *tmp = *__from_next;
          ++tmp;
          ++__from_next;
          __to_next = reinterpret_cast<From*>(tmp);
        }
        return result::ok;
      }
    };
    
    int main() {
      std::ofstream of2("test2.out");
      std::wbuffer_convert<MyConvert, wchar_t> conv(of2.rdbuf());
      std::wostream wof2(&conv);
      wof2 << L"сайт вопросов и ответов для программистов";
      wof2.flush();
      wof2.flush();
    }
    

    This is nothing you should use in your code. If this goes in the right direction, you need to read the documentations, including what is needed for this facet, what all this pointers mean, and how you need to write to them.

    If you want to use something like this, you need to think about which template arguments you should use for the facet (if any).

    Update I've now updated my code. The out-function is now closer to what we want I think. It is not beautiful and just a test code, and I am still unsure why __from_next is not updated (or kept).

    Currently the problem is that we cannot write to the stream. With gcc we just fall out of the sync of the wbuffer_convert, for clang we get an SIGILL.