Search code examples
c#azurems-wordazure-storageazure-storage-files

Azure Storage File Share looses metadata when files updated with MS Word


We are using File Share through Azure Storage account. As part of our application we assign ID to every file and store this ID in metadata: ID in Metadata

Set this ID via this block of code:

    public static void SetId(this CloudFile cloudFile, Guid id)
    {
        cloudFile.Metadata[DocumentDbId] = id.ToString();
        cloudFile.SetMetadata();
    }

However when this file is edited in Microsoft Word 2013 (all the files are .docx), this metadata is wiped clean and we loose references.enter image description here

If I create a text file, assign it an ID in metadata, then edit it with a notepad, then this metadata stays where it should be and not wiped.

Why editing with MS Word is wiping metadata? and how to prevent this from happening? Is there alternative way to set an arbitrary ID that is not wiped with edits?

UPD: Just to clarify this is my scenario: I mount a file share to as my local drive via net use K: \http://myaccount.file.core.windows.net \tests /u:AZURE\myaccount uNrI0yyRxyMx, I put a .docx file on the drive. In MS Azure Storage Explorer I right click the file, add metadata - any metadata, save it (tried this with C# as above, but result is just the same). Check it again to verify that the metadata was saved. Then open this file from the mounted drive in MS Word, do a change, save it. Go check the metadata on the file and there is nothing there.

But If I create a txt file, add metadata, then edit the file with a notepad++, save it. Metadata is not wiped out. So something that MS Word does to wipe the metadata


Solution

  • I had a confirmation from Microsoft engineer Json Shay that MS Word does funky stuff when writes to files:

    The reason is that MS Word (and many applications) use the Win32 ReplaceFile() API when saving a file, which is effectively a set of move+move+delete operations. Specifically, MS Word:

    Writes the new version of the file into a new temporary file, which contains no properties: ~newfile.docx Rename existingfile.docx --> existingfile_backup.docx Rename ~newfile.docx --> existingfile.docx Delete existingfile_backup.docx The properties were written on the original existingfile.docx, which then gets renamed away, and then deleted.

    This is different than notepad, which is modifying the existing file in-place.