I have PPTX files generated by users with PowerPoint 2016. The slides have embedded excel worksheets which I need to access for further processing. I am using Open Xml SDK v2.6.1 in my project.
On passing the embedded object stream to the SpreadsheetDocument, using the following code:
using (PresentationDocument pd = PresentationDocument.Open(pptxFile, true))
{
foreach (SlidePart slide in pd.PresentationPart.GetPartsOfType<SlidePart>())
{
foreach (EmbeddedObjectPart eoPart in slide.EmbeddedObjectParts)
{
using (SpreadsheetDocument sd = SpreadsheetDocument.Open(eoPart.GetStream(), true))
{
// do some work with worksheets
var count = sd.WorkbookPart.WorksheetParts.Count();
}
}
}
}
I get the following exception:
System.IO.FileFormatException: File contains corrupted data.
at System.IO.Packaging.ZipPackage..ctor(Stream s, FileMode packageFileMode, FileAccess packageFileAccess)
at System.IO.Packaging.Package.Open(Stream stream, FileMode packageMode, FileAccess packageAccess)
at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.OpenCore(Stream stream, Boolean readWriteMode)
at DocumentFormat.OpenXml.Packaging.SpreadsheetDocument.Open(Stream stream, Boolean isEditable, OpenSettings openSettings)
at...
When I open the pptx package and in the embeddings folder rename oleObject1.bin to oleObject1.zip, then see the file information in WinRar, I see that it is SFX Zip volume and not ZipArchive.
The only way I could get the SpreadsheetDocument to open the embedded object stream was to convert the stream to System.IO.Compression.ZipArchive using DotNetZip library.
So I have the following questions:
Note: this issue does not occur when the worksheet is embedded programmatically using OpenXml SDK in the presentation.
I finally figured out that though a tool like WinRar shows that the embedded object is SFX zip volume, it actually is a MS-CFB (Compound file binary) file.
You can work with CFB files in the following ways:
Bottom line, in order to work with office documents embedded in other office documents as embedded objects, are saved in MS-CFB format. Reading and writing to these files needs to be done outside of Open XML SDK, either using Win API or any other alternative.