I'm working with .mat files which are saved at the end of a program. The command is save foo.mat
so everything is saved. I'm hoping to determine if the program changes by inspecting the .mat files. I see that from run to run, most of the .mat file is the same, but the field labeled __function_workspace__
changes somewhat.
(I am inspecting the .mat files via scipy.io.loadmat
-- just loading the files and printing them out as plain text and then comparing the text. I found that save -ascii
in Matlab doesn't put string labels on things, so going through Python is roundabout, but I get labels and that's useful.)
I am trying to determine from where these changes originate. Can anyone explain what __function_workspace__
contains? Why would it not be the same from one run of a given program to the next?
The variables I am really interested in are the same, but I worry that I might be overlooking some changes that might come back to bite me. Thanks in advance for any light you can shed on this problem.
EDIT: As I mentioned in a comment, the value of __function_workspace__
is an array of integers. I looked at the elements of the array and it appears that these numbers are ASCII or non-ASCII character codes. I see runs of characters which look like names of variables or functions, so that makes sense. But there are also some characters (non-ASCII) which don't seem to be part of a name, and there are a lot of null (zero) characters too. So aside from seeing names of things in __function_workspace__
, I'm not sure what that stuff is exactly.
SECOND EDIT: I found that after commenting out calls to plotting functions, the content of __function_workspace__
is the same from one run of the program to the next, so that's great. At this point the only difference from one run to the next is that there is a __header__
field which contains a timestamp for the time at which the .mat file was created, which changes from run to run.
THIRD EDIT: I found an article, http://nbviewer.jupyter.org/gist/mbauman/9121961 "Parsing MAT files with class objects in them", about reverse-engineering the __function_workspace__
field. Thanks to Matt Bauman for this very enlightening article and thanks to @mpaskov for the pointer. It appears that __function_workspace__
is an undocumented catch-all for various stuff, only one part of which is actually a "function workspace".
You may want to take a look at DiffPlug. It can do diffs of MAT files and I believe there is a command line interface for it as well.
SciPy's __function_workspace__
refers to a special variable at the end of a MAT file that contains extra data needed for reference types (e.g. table
, string
, handle
, etc.) and various other stuff that is not covered by the official documentation. The name is misleading as it really refers to the "Subsystem" (briefly mentioned in the official spec as an offset in the header).
For example, if you save a reference type, e.g., emptyString = ""
, the resulting .mat
will contain the following two entries:
(1) The variable itself. It looks sort of like a UInt32
matrix, but is actually an Opaque
MCOS Reference
(MATLAB Class Object System) to a string
object at some location in the subsystem.
[0] Compressed (81 bytes, position = 128)
[0] Matrix (144 bytes, position = 0)
[0] UInt32[2] = [17, 0] // Opaque
[1] Int8[11] = ['emptyString'] // Variable Name
[2] Int8[4] = ['MCOS'] // Object Type
[3] Int8[6] = ['string'] // Class Name
[4] Matrix (72 bytes, position = 72)
[0] UInt32[2] = [13, 0] // UInt32
[1] Int32[2] = [6, 1] // Dimensions
[2] Int8[0] = [''] // Variable Name (not needed)
[3] UInt32[6] = [-587202560, 2, 1, 1, 1, 1] // Data (Reference Target)
(2) A UInt8
matrix without name (SciPy renamed this to __function_workspace__
) at the end of the file. Aside from the missing name it looks like a standard matrix, but the data is actually another MAT file (with a reduced header) that contains the real data.
[1] Compressed (251 bytes, position = 217)
[0] Matrix (968 bytes, position = 0)
[0] UInt32[2] = [9, 0] // UInt8
[1] Int32[2] = [1, 920] // Dimensions
[2] Int8[0] = [''] // Variable Name
[3] ... 920 bytes ... // Data (Nested MAT File)
The format of the data is unfortunately completely undocumented and somewhat of a mess. I could post the contents of the Subsystem, but it gets somewhat overwhelming even for such a simple case. It's essentially a MAT file that contains a struct
that contains a special variable (MCOS FileWrapper__
) that contains a cell array with various values, including one that magically encodes various Object Properties
.
Matt Bauman has done some great reverse engineering efforts (Parsing MAT files with class objects in them) that I believe all supporting implementations are based on. The MFL Java library contains a full (read-only) implementation of this (see McosFileWrapper.java).
Some updates on Matt Bauman's post that we found are:
Object Id
field looks like a unique id, but the order seems random and sometimes doesn't match. I don't know what this value is, but completely ignoring it seems to work well :)Segment 5
populated in .fig
files, but I haven't been able to narrow down what's in there yet.Edit: Fyi, once the string
object is correctly parsed and all properties are filled in, the actual string value is encoded in yet another undocumented format (see testDoubleQuoteString)