Search code examples
excelgzippowerquerydata-analysism

How to unzip GZ files using Power Query?


For zip files, I found this code and this tutorial that explain how to use the code and it works well.

The problem is that I have gz files and not zip files. The gz file contains a csv.

So I tried udapting some values like using Compression.GZip instead of Compression.Deflate but that didn't help. I'm still getting an empty table.

(GZFile) =>
let
    Header = BinaryFormat.Record([
        MiscHeader = BinaryFormat.Binary(14),
        BinarySize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
        FileSize   = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
        FileNameLen= BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
        ExtrasLen  = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)    
    ]),
 
    HeaderChoice = BinaryFormat.Choice(
        BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
        each if _ <> 67324752             // not the IsValid number? then return a dummy formatter
            then BinaryFormat.Record([IsValid = false, Filename=null, Content=null])
            else BinaryFormat.Choice(
                    BinaryFormat.Binary(26),      // Header payload - 14+4+4+2+2
                    each BinaryFormat.Record([
                        IsValid  = true,
                        Filename = BinaryFormat.Text(Header(_)[FileNameLen]),
                        Extras   = BinaryFormat.Text(Header(_)[ExtrasLen]),
                        Content  = BinaryFormat.Transform(
                            BinaryFormat.Binary(Header(_)[BinarySize]),
                            // (x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise null
                            (x) => try Binary.Buffer(Binary.Decompress(x, Compression.GZip)) otherwise null
                        )
                        ]),
                        type binary                   // enable streaming
                )
    ),
 
    ZipFormat = BinaryFormat.List(HeaderChoice, each _[IsValid] = true),
 
    Entries = List.Transform(
        List.RemoveLastN( ZipFormat(GZFile), 1),
        (e) => [FileName = e[Filename], Content = e[Content] ]
    )
in
    Table.FromRecords(Entries)

Can you guys show me how to modify the code please ?


Solution

  • You can read a CSV in GZIP as follows:

    let
        Source = Binary.Decompress(File.Contents("C:\Users\Dav\Downloads\test.gz"), Compression.GZip),
        #"Imported CSV" = Csv.Document(Source,[Delimiter=",", Encoding=1252])
    
    in
        #"Imported CSV"
    

    You can also read more on Chris Webb's blog.