Search code examples
c#filems-wordoffice-interopfile-properties

How to read 'Extended' MS Word file tags without Office.Interop?


I have .docx file with custom properties specified only for MS Office files. File properties

If I tried to open same file in computer without installed MS office then there is no Tags property in file details tab.

I need to read Tags in my c# code.

I tried this solution and retrieve Tags index as 18. Then I used next code:

public class TagsReader : ITagsReader
{
    private const int keywordsIndex = 18;

    public string Read(string filePath)
    {
        var fullPath = Path.GetFullPath(filePath);

        var directoryName = Path.GetDirectoryName(fullPath);
        Folder dir = GetShell32Folder(directoryName);
        var fileName = Path.GetFileName(fullPath);

        FolderItem item = dir.ParseName(fileName);
        return dir.GetDetailsOf(item, keywordsIndex);
    }

    private Folder GetShell32Folder(string folderPath)
    {
        var shellAppType = Type.GetTypeFromProgID("Shell.Application");
        var shell = Activator.CreateInstance(shellAppType);
        return (Folder)shellAppType.InvokeMember("NameSpace",
        BindingFlags.InvokeMethod, null, shell, new object[] { folderPath });
    }
}

But it does not work for computers without installed MS Office. It works only for .doc files but not for .docx. Now I used Interop based solution which is not stable, resource-intensive and requires to install MS Office to the server:

public class WordTagsReader : ITagsReader
{
    private readonly string[] availableFileExtensions = { ".docx" };
    public string Read(string filePath)
    {
        var fileExtension = Path.GetExtension(filePath);
        if (!availableFileExtensions.Contains(fileExtension))
            return null;

        dynamic application = null;
        dynamic document = null;
        var tags = string.Empty;
        try
        {
            var typeWord = Type.GetTypeFromProgID("Word.Application");
            application = Activator.CreateInstance(typeWord);
            application.Visible = false;
            application.DisplayAlerts = false;
            var fullFilePath = Path.GetFullPath(filePath);
            document = application.Documents.Open(fullFilePath);
            tags = document.BuiltInDocumentProperties["Keywords"].Value;
        }
        finally
        {
            if (document != null)
            {
                document.Close();
                document = null;
            }
            if (application != null)
            {
                application.Quit();
                application = null;
            }
        }

        return tags;
    }
}

This code can crashes from time to time and left running instances of MS Word which takes resources and blocks file. I've many handlers worked in the same time and then I can't separate "left" instances from properly worked and clean resources.

This is the reason to search alternate solution. Is there a way to read specific (custom) properties like Tags without using Office.Interop?


Solution

  • U can use warm lamp .docx format reading. Something like this:

    using System.IO.Packaging;
    
    var package = Package.Open(ms, FileMode.Open, FileAccess.ReadWrite);
    var corePart = package.GetPart(new Uri("/docProps/core.xml", UriKind.Relative))
    XDocument settings;
    using (TextReader tr = new StreamReader(settingsPart.GetStream()))
        settings = XDocument.Load(tr);
    
    XNamespace cp = "http://schemas.openxmlformats.org/package/2006/metadata/core-properties"
    var tags = settings.Root.Element(cp + "keywords");
    

    No need to use additional libraries or sdk's. Only System.IO, only hardcore!