Search code examples
c++filestreamcodec

std::codecvt do_out skipping characters after 1:N conversions


I've tried to write an automatic indenter, however - it's skipping characters when it added new characters to the stream. I've tried debugging it and verified that from_next and to_next as well as from and to are working correctly.

Surely I've missed something in the specs but here is my code, maybe you an help me:

  virtual result_t do_out(state_type& state, const intern_type* from, const intern_type* from_end, const intern_type*& from_next,
     extern_type* to, extern_type* to_end, extern_type*& to_next) const override
  {
    auto result = std::codecvt_base::noconv;

    while (from < from_end && to < to_end)
    {
      if (getState(state).missingWhitespaces > 0u && *from != '\n')
      {
        while (getState(state).missingWhitespaces > 0u && to < to_end)
        {
          *to = ' ';
          to++;
          getState(state).missingWhitespaces--;
        }
        
        if (to < to_end)
        {
          result = std::codecvt_base::partial;
        }
        else
        {
          result = std::codecvt_base::partial;
          break;
        }
      }
      else
      {
        *to = *from;
         
        if (*from == '\n')
        {
          getState(state).missingWhitespaces = tabSize * indentLevel;
        }
        
        to++;
        from++;
      }
    }

    from_next = from;
    to_next = to;
   
    return result;
  };

The state object is also working properly. The problem only occurs in between function calls.

Edit: Changing the result after if (to < to_end) to std::codecvt_base::ok doesn't solve the problem either.


Solution

  • After some more digging I found the solution to my problem. I got a detailed explanation of std::codecvt from this website: http://stdcxx.apache.org/doc/stdlibref/codecvt.html

    It turned out, that I forgot to override these two methods:

    virtual int do_length(state_type& state, const extern_type *from, const extern_type *end, size_t max) const;
    Determines and returns n, where n is the number of elements of extern_type in the source range [from,end) that can be converted to max or fewer characters of intern_type, as if by a call to in(state, from, from_end, from_next, to, to_end, to_next) where to_end == to + max.

    Sets the value of state to correspond to the shift state of the sequence starting at from + n.

    Function do_length must be called under the following preconditions:

    state is either initialized to the beginning of a sequence or equal to the result of the previous conversion on the sequence.

    from <= end is well-defined and true.

    Note that this function does not behave similarly to the C Standard Library function mbsrtowcs(). See the mbsrtowcs.cpp example program for an implementation of this function using the codecvt facet.

    virtual int do_max_length() const throw();

    Returns the maximum value that do_length() can return for any valid combination of its first three arguments, with the fourth argument max set to 1.

    I implemented them this way and it worked:

      virtual int do_length(state_type& state, const extern_type* from, const extern_type* end, size_t max) const override
      { 
        auto numberOfCharsAbleToCopy = max;
        
        numberOfCharsAbleToCopy -= std::min(static_cast<unsigned int>(numberOfCharsAbleToCopy), getState(state).missingWhitespaces);
        
        bool newLineToAppend = false;
        for (auto c = from + getState(state).missingWhitespaces; c < end && numberOfCharsAbleToCopy > 0u; c++)
        {
          if (*c == '\n' && !newLineToAppend)
          {
            newLineToAppend = true;
          }
          else if (*c != '\n' && newLineToAppend)
          {
            numberOfCharsAbleToCopy -= std::min(tabSize * indentLevel, numberOfCharsAbleToCopy);
            
            if (numberOfCharsAbleToCopy == 0u)
            {
              break;
            }
            
            newLineToAppend = false;
          }
        }
        
        return numberOfCharsAbleToCopy;
      }
      
      virtual int do_max_length() const throw() override
      {
        return tabSize * indentLevel;
      }