Search code examples
c#openxmlopenxml-sdk

How to delete all embedded objects in Word and PowerPoint using Open XML SDK?


I am trying to delete all embedded object from Word and PowerPoint files using openxml SDK. I am new to Open XML and not sure whether I am doing this correctly. Below is the code I have. My intention is to remove any objects embedded and to delete images embedded. Both codes when executed are giving errors.

Code that I tried to delete all embedded items in the document.

using (var wdDoc = WordprocessingDocument.Open(wordFilePath, true))
{
    var docPart = wdDoc.MainDocumentPart;
    var document = docPart.Document;
    var embeddedObjectsCount = docPart.EmbeddedObjectParts.Count();
    while (embeddedObjectsCount > 0)
    {
        docPart.DeletePart(docPart.EmbeddedObjectParts.FirstOrDefault());
        embeddedObjectsCount = docPart.EmbeddedObjectParts.Count();
    }
}

Code that I tried to delete all images in the document. (This works partially if I don't have any objects embedded in the document.)

using (var wdDoc = WordprocessingDocument.Open(wordFilePath, true))
{
    var docPart = wdDoc.MainDocumentPart;
    var document = docPart.Document;
    var imageObjectsCount = docPart.ImageParts.Count();
    while (imageObjectsCount > 0)
    {
        docPart.DeletePart(docPart.ImageParts.FirstOrDefault());
        imageObjectsCount = docPart.ImageParts.Count();
    }
}

When I run the above code the file I use is getting corrupted. I would like to know how to remove all embedded objects from Word without corrupting the file.

I haven't done anything on PowerPoint yet, but I hope it would be similar to Word document.


Solution

  • I managed to find a solution for my problem. I had to dive in to the concepts of Open XML SDK to get this. However, I am not so sure on whether this is the optimal solution.

    Goal

    1. Remove all embedded objects in PowerPoint and Word.

    2. Remove all images in PowerPoint and Word.

    For Word

    //using Ovml = DocumentFormat.OpenXml.Vml.Office;
    //Determine whether there are any Embedded Objects in the document
    using (var wdDoc = WordprocessingDocument.Open(wordFilePath, true))
    {
        var docPart = wdDoc.MainDocumentPart;
        var docHasEmbeddedOleObjects = document.Body.Descendants<Ovml.OleObject>().Any();
        if (docHasEmbeddedOleObjects)
        {
            foreach (var oleObj in document.Body.Descendants<Ovml.OleObject>())
            {
                oleObj.Remove(); //Remove each ole object in the document. This will remove the object from view in word.
            }
            //Delete the embedded objects. This will remove the actual attached files from the document.
            docPart.DeleteParts(docPart.EmbeddedObjectParts);
            //Delete all picture in the document
            docPart.DeleteParts(docPart.ImageParts);
        }
    }
    

    For PowerPoint

    using (var ppt = PresentationDocument.Open(powerPointFilePath, true))
    {
        foreach (var slide in slides)
        {
            //Remove Ole Objects
            var oleObjectCount = slide.Slide.Descendants<OleObject>().Count();
            while (oleObjectCount > 0)
            {
                var oleObj = slide.Slide.Descendants<OleObject>().FirstOrDefault();
                var oleObjGraphicFrame = oleObj?.Ancestors<GraphicFrame>().FirstOrDefault();
                if (oleObjGraphicFrame != null)
                {
                    oleObjGraphicFrame.RemoveAllChildren();
                    oleObjGraphicFrame.Remove();
                }
                oleObjectCount = slide.Slide.Descendants<OleObject>().Count();
            }
            //Delete embedded objects
            slide.DeleteParts(slide.EmbeddedObjectParts);
            //Delete all pictures
            slide.DeleteParts(slide.ImageParts);
        }
    }