Search code examples
c#comms-wordinterop

Why does FinalReleaseComObject cause "(InteropProgram) has stopped working"?


I'm trying to read text and images from a Word document and close it. The problem is trying to close it without Word encountering any issues OR creating multiple WINWORD.exe instances. My problem is that when I call Marshal.FinalReleaseComObject(app); on the Word.ApplicationClass, Word fires a generic exception provided by Windows ("Word has stopped working"). I have read many of the solutions in How do I properly clean up Excel interop objects? and implemented the recommendations, but I still have the issue.

Here is my code. I am only reading one Word file with one page (you may want to skip to "// Cleanup:" where the exception occurs).

    private byte[] GetDocumentText(byte[] wordBytes, string path)
    {
        // Save bytes to word file in temp dir, open, copy info. Then delete the temp file after.

        object x = Type.Missing;
        string ext = Path.GetExtension(path).ToLower();
        string tmpPath = Path.ChangeExtension(Path.GetTempFileName(), ext);
        File.WriteAllBytes(tmpPath, wordBytes);

        // Open temp file with Excel Interop:
        Word.ApplicationClass app = new Word.ApplicationClass();
        Word.Documents docs = app.Documents;
        Word.Document doc = docs.Open(tmpPath, x, x, x, x, x, x, x, x, x, x, x, x, x, x);

        doc.ActiveWindow.Selection.WholeStory();
        doc.ActiveWindow.Selection.Copy();
        IDataObject data = Clipboard.GetDataObject();
        string documentText = data.GetData(DataFormats.Text).ToString();

        // Add text to pages.
        byte[] wordDoc = null;
        using (MemoryStream myMemoryStream = new MemoryStream())
        {
            Document myDocument = new Document();
            PdfWriter myPDFWriter = PdfWriter.GetInstance(myDocument, myMemoryStream); // REQUIRED.
            PdfPTable table = new PdfPTable(1);
            myDocument.Open();

            // Create a font that will accept unicode characters.
            BaseFont bfArial = BaseFont.CreateFont(@"C:\Windows\Fonts\Arial.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
            Font arial = new Font(bfArial, 12);

            // If Hebrew character found, change page direction of documentText.
            PdfPCell page = new PdfPCell(new Paragraph(documentText, arial)) { Colspan = 1 };
            Match rgx = Regex.Match(documentText, @"\p{IsArabic}|\p{IsHebrew}");
            if (rgx.Success) page.RunDirection = PdfWriter.RUN_DIRECTION_RTL;

            table.AddCell(page);

            // Add image to document (Not in order with text...)
            foreach (Word.InlineShape ils in doc.InlineShapes)
            {
                if (ils != null && ils.Type == Word.WdInlineShapeType.wdInlineShapePicture)
                {
                    PdfPCell imageCell = new PdfPCell();
                    ils.Select();
                    doc.ActiveWindow.Selection.Copy();
                    System.Drawing.Image img = Clipboard.GetImage();
                    byte[] imgb = null;
                    using (MemoryStream ms = new MemoryStream())
                    {
                        img.Save(ms, System.Drawing.Imaging.ImageFormat.Jpeg);
                        imgb = ms.ToArray();
                    }

                    Image wordPic = Image.GetInstance(imgb);
                    imageCell.AddElement(wordPic);
                    table.AddCell(imageCell);
                }
            }

            myDocument.Add(table);
            myDocument.Close();
            myPDFWriter.Close();
            wordDoc = myMemoryStream.ToArray();
        }

        // Cleanup:
        Clipboard.Clear();

        (doc as Word._Document).Close(Word.WdSaveOptions.wdDoNotSaveChanges, x, x);
        Marshal.FinalReleaseComObject(doc);
        Marshal.FinalReleaseComObject(docs);
        (app as Word._Application).Quit(x, x, x);
        Marshal.FinalReleaseComObject(app); // Word encounters exception here.

        doc = null;
        docs = null;
        app = null;
        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();
        GC.WaitForPendingFinalizers();

        try { File.Delete(tmpPath); }
        catch { }

        return wordDoc;
    }

This doesn't always happen the first time I read the file. When I read it a second or third time, I usually get the error.

Is there any way I can prevent the error from showing?


Solution

  • Seeing this crash is fairly unusual, Word normally knows how to deal with this kind of sledge-hammer approach to memory management. Nevertheless, it is a very bad practice. Best described by this blog post from the Visual Studio team. Worth a complete read, the "silent assassin" section is the most relevant.

    Calling GC.Collect is enough to get all the COM references released, no additional help is required. That however doesn't work if you run your program with the debugger attached. This answer explains why.

    To get GC.Collect() to work in the debugger as well, you need to move it in a separate method so that the debugger can't keep the references alive. That's easiest done like this:

    private byte[] GetDocumentText(byte[] wordBytes, string path) {
       var retval = GetDocumentTextImpl(wordBytes, path);
       GC.Collect();
       GC.WaitForPendingFinalizers();
       return retval;
    }
    
    private byte[] GetDocumentTextImpl(byte[] wordBytes, string path) {
       // etc...
    }
    

    And move your original code into the GetDocumentTextImpl() method. Just delete all the Marshal and GC calls from the code since they are completely unnecessary. And dangerous.