Search code examples
c#excelspreadsheetopenxmlopenxml-sdk

Find all external relationships in an OOXML/Excel spreadsheet


I am trying to register all external relationships in a OOXML spreadsheet. The code below provides no results. What am I doing wrong? I suspect I am looking the wrong place, I am looking in "WorkSheetParts". I am using Open XML SDK.

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;

using (var spreadsheet = SpreadsheetDocument.Open(file, false))
{
    var list = spreadsheet.WorkbookPart.WorksheetParts.ToList();
    foreach (var item in list)
    {
        int count = item.ExternalRelationships.Count(); // Register the number of embedded objects
        if (count == 0) // If no embedded objects, inform user
        {
            Console.WriteLine($"--> {count} external relationships");
        }
        else
        {
            Console.WriteLine($"--> {count} external relationships");
            var parts = item.ExternalRelationships.ToList(); // Register each object to a list

            foreach (var part in parts) // Inform user of each object
            {
                Console.WriteLine(part.Uri.ToString());
                Console.WriteLine(part.RelationshipType.ToString());
                Console.WriteLine(part.IsExternal.ToString());
                Console.WriteLine(part.Container.ToString());
            }
        }
    }
}

Solution

  • TL;DR: You're traversing the OOXML Spreadsheet document structure incorrectly.

    The quick-fix is to use the built-in GetAllParts() extension method, then for each OpenXmlPart entry inspect its ExternalRelationships property (it's a lazily-evaluated enumerable, not an in-memory collection), and looking for non-empty collections of ExternalRelationship objects... or a single linq expression, like so:

    static List<ExternalRelationship> GetExternalReferences( SpreadsheetDocument spreadsheet )
    {
        return spreadsheet
            .GetAllParts()
            .SelectMany( p => p.ExternalRelationships )
            .ToList();
    }
    

    As a more complete example:

    using System;
    using System.IO;
    using System.Collections.Generic;
    using System.Linq;
    
    using DocumentFormat.OpenXml;
    using DocumentFormat.OpenXml.Packaging;
    using DocumentFormat.OpenXml.Spreadsheet;
    
    static void Main()
    {
        DirectoryInfo root = new DirectoryInfo( @"Test_Data" );
    
        foreach( FileInfo xlsx in root.GetFiles( "*.xlsx", SearchOption.AllDirectories ) )
        {
            SpreadsheetDocument spreadsheet;
            try
            {
                spreadsheet = SpreadsheetDocument.Open( path: xlsx.FullName, isEditable: false );
            }
            catch( Exception ex )
            {
                Console.WriteLine( "File \"{0}\" cannot be opened: {1}", xlsx.FullName, ex.Message );
                continue;
            }
    
            using( spreadsheet )
            {
                List<ExternalRelationship> extRels = spreadsheet
                    .GetAllParts()
                    .SelectMany( p => p.ExternalRelationships )
                    .ToList();
    
                if( extRels.Count > 0 )
                {
                    Console.WriteLine( "File \"{0}\" has {1:N0} external references:", xlsx.FullName, extRels.Count );
                    foreach( ExternalRelationship r in extRels )
                    {
                        Console.WriteLine( "\tUri             : \"{0}\"", r.Uri );
                        Console.WriteLine( "\tRelationshipType: \"{0}\"", r.RelationshipType );
                        Console.WriteLine( "\tIsExternal      : {0}", r.IsExternal );
                        Console.WriteLine( "\tContainer       : {0}", r.Container );
                        Console.WriteLine();
                    }
                }
                else
                {
                    Console.WriteLine( "File \"{0}\" has no external references.", xlsx.FullName );
                }
            }
        }
    
    }
    

    Output when using files from the Test_Data.zip you posted:

    File "Test_Data\Workbook_strict.xlsx" has no external references.
    File "Test_Data\Workbook_transitional.xlsx" has no external references.
    File "Test_Data\Another folder\123.xlsx" has no external references.
    File "Test_Data\Another folder\Created as Strict_Now Transitional.xlsx" has no external references.
    File "Test_Data\Another folder\Created as Transitional_Now Strict.xlsx" has no external references.
    File "Test_Data\Another folder\Password protected.xlsx" cannot be opened: End of Central Directory record could not be found.
    File "Test_Data\Another folder\With chains to cells in another spreadsheet.xlsx" has 1 external references:
        Uri             : "123.xlsx"
        RelationshipType: "http://schemas.openxmlformats.org/officeDocument/2006/relationships/externalLinkPath"
        IsExternal      : True
        Container       : DocumentFormat.OpenXml.Packaging.ExternalWorkbookPart
    
    File "Test_Data\Another folder\With data connection.xlsx" has no external references.
    File "Test_Data\Another folder\With embedded objects.xlsx" has 1 external references:
        Uri             : "file:///C:/Users/Sepideh/Desktop/Test_Data/Another%20folder/123.xlsx"
        RelationshipType: "http://schemas.openxmlformats.org/officeDocument/2006/relationships/oleObject"
        IsExternal      : True
        Container       : DocumentFormat.OpenXml.Packaging.ExternalWorkbookPart
    
    File "Test_Data\Another folder\Another folder\1234.xlsx" has no external references.
    File "Test_Data\Another folder\Another folder\random1.xlsx" has no external references.
    File "Test_Data\Another folder\Another folder 2\New.xlsx" has no external references.
    File "Test_Data\Another folder\Another folder\Last folder\some_filename.xlsx" has no external references.