Search code examples
pythonpowerbi7zippkzip

PowerBI .pbix DataMashup Compressed Directory


I am attempting to inspect a PowerBI .pbix file using python's zipfile library.

When unzipping the .pbix archive, I get the following structure:

DataMashup
DataModel
DiagramLayout
Metadata
Report
ReporLayout
ReporStaticResources
ReporStaticResourceSharedResources
ReporStaticResourceSharedResourceBaseThemes
ReporStaticResourceSharedResourceBaseThemeCY18SU07.json
SecurityBindings
Settings
Version
[Content_Types].xml

It appears that the DataMashup file in the .pbix archive is some sort of off-brand archive of a directory.

The DataMashup object does not appear to be compressed, as I can easily read xml data when printing the object in the python interpreter.

Using 7zip I am able to access everything within:

DataMashup/
    Config/
        Package.xml
    Formulas/
        Section1.m # m and/or dax looking stuff
[Content_Types].xml

How can I discover the format of the DataMashup archive-within-an-archive?

One clue is in the binary data at the top of the DataMashup object: \x00\x00\x00\x00\x07\x05\x00\x00PK which may indicate pkzip.

Another clue may be this output when attempting to use unzip on the DataMashup file:

$ unzip DataMashup
Archive:  DataMashup
warning [DataMashup]:  6215 extra bytes at beginning or within zipfile

I was able to uncompress the DataMashup directory on linux using 7za:

WARNINGS:
There are data after the end of archive

--
Path = DataMashup
Type = zip
WARNINGS:
There are data after the end of archive
Offset = 8
Physical Size = 1303
Tail Size = 5148

Everything is Ok

Archives with Warnings: 1

Warnings: 1
Files: 3
Size:       2040
Compressed: 6459

Despite the warnings, the files appear okay. Unfortunately, this does not help me on windows.


Solution

  • pbix files are zipped, so one need to unzip the file. DataMashup follows the MS-QDEFF spec.

    The DataMashup file within the archive is also an archive, it contains Section1.m which has the query source definitions

    1. change file section1.m
    2. repackage DataMashup
    3. rezip and change the extension back to xlsx

    here a really good tutorial in c#

    https://www.titanwolf.org/Network/q/8acb9f29-4b28-400b-b8df-cbe523edcb01/y

    and another here, using power shell

    https://querypower.medium.com/extracting-power-queries-41fd73d3d6a2