I have a large string of RTF formatted data. The formatting is sound, put it in an ANSI text file, rename it to *.rtf and WordPad will display it correctly.
The string is essentially a std:wstring, not a TUnicodeString
If I do the following, the text is displayed properly with proper color formatting etc:
TStringStream *Stream = new TStringStream(String(MyStr.c_str(), MyStr.size())) ;
Stream->Position = 0 ;
RichEdit1->PlainText = false ;
RichEdit1->Lines->LoadFromStream(Stream) ;
delete Stream ;
All good, it works, but I was thinking of avoiding the memcpy that takes place when the String is created, which would save some resources with particularly huge strings.
My aim was to create a TCustomMemoryStream
descendant that takes MyStr as input and uses its internal memory by calling SetPointer((void*)MyStr.c_str(), MyStr.Length() * 2 /*Size in bytes*/)
during construction.
This saves a memcpy if handled with care (MyStr must outlive the Stream etc.) and it's an easy and quick implementation.
Sadly .. it doesn't work properly and I can't seem to figure out why ? I have a working solution, I could move on .. but it bugs me .. so please enlighten me.
Implemented slightly differently for testing, but it comes down to the same:
TMemoryStream *Stream = new TMemoryStream() ;
Stream->Write((void*)MyStr.c_str(), MyStr.length() * 2 /*Bytes*/) ;
Stream->Position = 0 ;
RichEdit1->PlainText = false ;
RichEdit1->Lines->LoadFromStream(Stream) ;
delete Stream ;
RichEdit is unable to show the formatted text. Instead it shows the plain text (characters spaced out). I understand this to be a case of not getting the encoding right, that makes sense.
So I tell LoadFromStream() what encoding to use:
TMemoryStream *Stream = new TMemoryStream() ;
Stream->Write((void*)MyStr.c_str(), MyStr.length() * 2 /*Bytes*/) ;
Stream->Position = 0 ;
RichEdit1->PlainText = false ;
RichEdit1->Lines->LoadFromStream(Stream, TEncoding::Unicode) ;
delete Stream ;
The text is shown properly now but still as plain text, the rtf is not parsed. I don't understand why, it seems the text arrives alright in its entirety, copy pasted to a text file and compared to the earlier rtf file the content is identical.
I figured perhaps the encoding needs a BOM to work properly (since that is the default in TEncoding::Unicode
), so I added one for testing:
TMemoryStream *Stream = new TMemoryStream() ;
WORD BOM = 0xFEFF ;
Stream->Write((void*)&BOM, 2) ;
Stream->Write((void*)MyStr.c_str(), MyStr.length() * 2 /*Bytes*/) ;
Stream->Position = 0 ;
RichEdit1->PlainText = false ;
RichEdit1->Lines->LoadFromStream(Stream, TEncoding::Unicode) ;
delete Stream ;
But it doesn't make a difference. So I tried to opposite (pass TEncoding
that doesn't require a BOM):
TMemoryStream *Stream = new TMemoryStream() ;
Stream->Write((void*)MyStr.c_str(), MyStr.length() * 2 /*Bytes*/) ;
Stream->Position = 0 ;
RichEdit1->PlainText = false ;
TUnicodeEncoding *Encoding = new TUnicodeEncoding(false /*UseBOM*/) ;
RichEdit1->Lines->LoadFromStream(Stream, Encoding) ;
delete Encoding ;
delete Stream ;
Sadly, still just plain text
I tried a handful of other things as well in a test app, load in TMemo, Save to stream, load in RichEdit etc. (with various results), I also tried setting an Encoding during TStringStream construction with strange results, but I don't want to clutter this Q with that.
I'd like to understand why TRichEdit is unable to parse the rtf even though it seems to get all data correctly as it displays it in plain text
I'm currently using C++ Builder 12
EDIT 1 - after Remy's input
IOW, it actually converts the String to the specified (or in this case, defaulted) encoding
Oh wow, an even bigger penalty I wasn't aware of.
I thought it was a means to tell TStringStream
what the input encoding is rather than what it needs to be after storage
What I then find very confusing is that when I LoadFromStream
but pass Unicode as encoding:
RichEdit1->Lines->LoadFromStream(Stream, TEncoding::Unicode) ;
It still works properly ? So, what is LoadFromStream
doing with this encoding then ? Since the Stream has been converted to ANSI and RichEdit itself requires ANSI as well (and since that is what seems to be sent to Windows - because it works) ?
When I try the opposite, no conversion in TStringStream
(and hence UTF-16 storage)
TStringStream *Stream =
new TStringStream(String(MyStr.c_str(), MyStr.Length()), TEncoding::Unicode, true) ;
but supposed conversion in LoadFromStream
:
RichEdit1->Lines->LoadFromStream(Stream) ;
or
RichEdit1->Lines->LoadFromStream(Stream, TEncoding::Default) ;
It doesn't work, so LoadFromStream
doesn't use the encoding to convert (as is the case during TStringStream construction) ?
And to add to my confusion, you mentioning that Unicode can't be used as input for LoadFromStream
would mean that non-latin characters don't get converted to rtf text (unless the default TConversion
takes care of it, which is doesn't - I just checked). Well actually the information was already lost before TConversion could look at it since the string was converted to ANSI in the working case, and when kept as Unicode, it didn't work at all anyways.
The information gets lost during conversion to ANSI and the characters are shown as question marks.
Which is doubly confusing since adding special characters to a properly displaying rtf doc
RichEdit1->Lines->Add(L"你好") ; // Chinese simplified: Nǐ hǎo
works perfectly fine and all content is correctly preserved.
Does this mean that VCL converts special characters to rtf formatted text in case of Add()
? Or does this mean Windows' RichEdit can take Unicode as input when lines are added, which then makes me wonder if there is not also a unicode version for the in-streaming (and is VCL code not aware of this then) ?
EDIT 2
Following your suggestion to use TPointerStream
and keeping in mind the RichEdit restrictions I first create an ANSI string (based on std::string
)
Which I then use in following way:
TPointerStream *Stream =
new TPointerStream((void*)MyAnsiStr.c_str(), MyAnsiStr.length(), true /*ReadOnly*/) ;
This works well with:
RichEdit1->Lines->LoadFromStream(Stream) ;
but it also (unexpectedly) works well with:
RichEdit1->Lines->LoadFromStream(Stream, TEncoding::Unicode) ;
Which I don't understand, since the input is not Unicode and since the output (to RichEdit) is not Unicode
My aim was to create a
TCustomMemoryStream
descendant that takesMyString
as input and uses its internal memory
Note that the RTL already has a class for that very purpose - TPointerStream
.
I'd like to understand why TRichEdit is unable to parse the rtf even though it seems to get all data correctly as it displays it in plain text
TStringStream
defaults to using TEncoding::Default
when storing the String
into its memory. IOW, it actually converts the String
to the specified (or in this case, defaulted) encoding, and then stores the converted bytes.
And when the TRichEdit::Lines::LoadFromStream()
method is loading a TStream
, it also assumes TEncoding::Default
when no TEncoding
is specified explicitly, and no BOM is present in the stream data.
That is why your TStringStream
test worked. Your String
got converted to an encoding that LoadFromStream()
was expecting.
However, on Windows TEncoding::Default
is the same as TEncoding::ANSI
. If you store UTF-16 encoded bytes in your TMemoryStream
, that will not match what TEncoding::ANSI
is expecting, so you would have to be explicit about the actual encoding you want to use.
Now, when you did specify the encoding explicitly, things still didn't work, because when PlainText
is false then TRichEdit
uses SF_RTF
without SF_UNICODE
when issuing the EM_STREAMIN
window message to itself. SF_UNICODE
is used only when PlainText
is true (SF_TEXT
instead of SF_RTF
). RTF is a 7bit ASCII format, and SF_RTF
can't handle UTF-16 (which is also why your TStringStream
test worked).
When SF_RTF
fails, TRichEdit
will reattempt again with SF_TEXT
and SF_UNICODE
instead, which is then why you end up with the plain text version of your RTF.
So, in short, you should not use UTF-16 data when using PlainText=false
. But, if you really want to use UTF-16 encoded RTF, you will have to implement a custom TConversion
descendant, and then assign that class type to the TRichEdit::DefaultConverter
property.