I am trying to delete all embedded object from Word and PowerPoint files using openxml SDK. I am new to Open XML and not sure whether I am doing this correctly. Below is the code I have. My intention is to remove any objects embedded and to delete images embedded. Both codes when executed are giving errors.
Code that I tried to delete all embedded items in the document.
using (var wdDoc = WordprocessingDocument.Open(wordFilePath, true))
{
var docPart = wdDoc.MainDocumentPart;
var document = docPart.Document;
var embeddedObjectsCount = docPart.EmbeddedObjectParts.Count();
while (embeddedObjectsCount > 0)
{
docPart.DeletePart(docPart.EmbeddedObjectParts.FirstOrDefault());
embeddedObjectsCount = docPart.EmbeddedObjectParts.Count();
}
}
Code that I tried to delete all images in the document. (This works partially if I don't have any objects embedded in the document.)
using (var wdDoc = WordprocessingDocument.Open(wordFilePath, true))
{
var docPart = wdDoc.MainDocumentPart;
var document = docPart.Document;
var imageObjectsCount = docPart.ImageParts.Count();
while (imageObjectsCount > 0)
{
docPart.DeletePart(docPart.ImageParts.FirstOrDefault());
imageObjectsCount = docPart.ImageParts.Count();
}
}
When I run the above code the file I use is getting corrupted. I would like to know how to remove all embedded objects from Word without corrupting the file.
I haven't done anything on PowerPoint yet, but I hope it would be similar to Word document.
I managed to find a solution for my problem. I had to dive in to the concepts of Open XML SDK to get this. However, I am not so sure on whether this is the optimal solution.
Goal
Remove all embedded objects in PowerPoint and Word.
Remove all images in PowerPoint and Word.
For Word
//using Ovml = DocumentFormat.OpenXml.Vml.Office;
//Determine whether there are any Embedded Objects in the document
using (var wdDoc = WordprocessingDocument.Open(wordFilePath, true))
{
var docPart = wdDoc.MainDocumentPart;
var docHasEmbeddedOleObjects = document.Body.Descendants<Ovml.OleObject>().Any();
if (docHasEmbeddedOleObjects)
{
foreach (var oleObj in document.Body.Descendants<Ovml.OleObject>())
{
oleObj.Remove(); //Remove each ole object in the document. This will remove the object from view in word.
}
//Delete the embedded objects. This will remove the actual attached files from the document.
docPart.DeleteParts(docPart.EmbeddedObjectParts);
//Delete all picture in the document
docPart.DeleteParts(docPart.ImageParts);
}
}
For PowerPoint
using (var ppt = PresentationDocument.Open(powerPointFilePath, true))
{
foreach (var slide in slides)
{
//Remove Ole Objects
var oleObjectCount = slide.Slide.Descendants<OleObject>().Count();
while (oleObjectCount > 0)
{
var oleObj = slide.Slide.Descendants<OleObject>().FirstOrDefault();
var oleObjGraphicFrame = oleObj?.Ancestors<GraphicFrame>().FirstOrDefault();
if (oleObjGraphicFrame != null)
{
oleObjGraphicFrame.RemoveAllChildren();
oleObjGraphicFrame.Remove();
}
oleObjectCount = slide.Slide.Descendants<OleObject>().Count();
}
//Delete embedded objects
slide.DeleteParts(slide.EmbeddedObjectParts);
//Delete all pictures
slide.DeleteParts(slide.ImageParts);
}
}