For zip files, I found this code and this tutorial that explain how to use the code and it works well.
The problem is that I have gz
files and not zip
files. The gz
file contains a csv
.
So I tried udapting some values like using Compression.GZip instead of Compression.Deflate but that didn't help. I'm still getting an empty table.
(GZFile) =>
let
Header = BinaryFormat.Record([
MiscHeader = BinaryFormat.Binary(14),
BinarySize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
FileSize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
FileNameLen= BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
ExtrasLen = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)
]),
HeaderChoice = BinaryFormat.Choice(
BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
each if _ <> 67324752 // not the IsValid number? then return a dummy formatter
then BinaryFormat.Record([IsValid = false, Filename=null, Content=null])
else BinaryFormat.Choice(
BinaryFormat.Binary(26), // Header payload - 14+4+4+2+2
each BinaryFormat.Record([
IsValid = true,
Filename = BinaryFormat.Text(Header(_)[FileNameLen]),
Extras = BinaryFormat.Text(Header(_)[ExtrasLen]),
Content = BinaryFormat.Transform(
BinaryFormat.Binary(Header(_)[BinarySize]),
// (x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise null
(x) => try Binary.Buffer(Binary.Decompress(x, Compression.GZip)) otherwise null
)
]),
type binary // enable streaming
)
),
ZipFormat = BinaryFormat.List(HeaderChoice, each _[IsValid] = true),
Entries = List.Transform(
List.RemoveLastN( ZipFormat(GZFile), 1),
(e) => [FileName = e[Filename], Content = e[Content] ]
)
in
Table.FromRecords(Entries)
Can you guys show me how to modify the code please ?
You can read a CSV in GZIP as follows:
let
Source = Binary.Decompress(File.Contents("C:\Users\Dav\Downloads\test.gz"), Compression.GZip),
#"Imported CSV" = Csv.Document(Source,[Delimiter=",", Encoding=1252])
in
#"Imported CSV"
You can also read more on Chris Webb's blog.