Search code examples
c#xamlrtf

Extracting text from RTF with text and image


I have a bytearray extracted from a WPF RichTextControl for which I extract text from. I use following code successfully:

FlowDocument document = new FlowDocument();
TextRange txtRange = null;
using (MemoryStream stream = new MemoryStream(data))
{
    txtRange = new TextRange(document.ContentStart, document.ContentEnd);
    txtRange.Load(stream, DataFormats.XamlPackage);
}

The problem starts when there is an image embedded in the rtf. I would still like to extract the text but the code above will fail with XamlParseException on the Load method.

I tried using following method:

using (RichTextBox rtb = new RichTextbox())
{
  rtb.Rtf = System.Text.Encoding.Default.GetString(data);
  // use rtb.Text
}

but the setting of rtb.Rtf fails with ArgumentException. Reason is probably explained here since the GetString indeed does not return the expected rtf format but mixed text/binary data with mentions of xaml (same format also returns for text only, which was successfully extracted with previous method). I cannot upgrade framework.

I don't mind traversing the FlowDocument tree if needed to extract text if I can find a way to load the document successfully.

Is there an additional way to read the RTF?


Solution

  • Apperantly when an image is included in the RTF, the code will work when running in STA. e.g.:

    Thread t = new Thread(() => Foo(data));
    t.SetApartmentState(Apartment.STA);
    t.Start();
    t.Join();
    
    Foo()
    {
      FlowDocument document = new FlowDocument();
      TextRange txtRange = null;
      using (MemoryStream stream = new MemoryStream(data))
      {
          txtRange = new TextRange(document.ContentStart, document.ContentEnd);
          txtRange.Load(stream, DataFormats.XamlPackage);
      }
    }