Search code examples
c#.netxml-parsingcompressionxmlreader

Download and Unzip XML file


I would like to unzip and parse an xml file located here

Here is my code:

HttpClientHandler handler = new HttpClientHandler()
{
    CookieContainer = new CookieContainer(),
    UseCookies = true,
    AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate,
   // | DecompressionMethods.None,

};

using (var http = new HttpClient(handler))
{

    var response =
         http.GetAsync(@"https://login.tradedoubler.com/report/published/aAffiliateEventBreakdownReportWithPLC_806880712_4446152766894956100.xml.zip").Result;

    Stream streamContent = response.Content.ReadAsStreamAsync().Result;

    using (var gZipStream = new GZipStream(streamContent, CompressionMode.Decompress))
    {
        var settings = new XmlReaderSettings()
        {
             DtdProcessing = DtdProcessing.Ignore
         };

         var reader = XmlReader.Create(gZipStream, settings);
         reader.MoveToContent();

         XElement root = XElement.ReadFrom(reader) as XElement;
     }
}

I get an exception on XmlReader.Create(gZipStream, settings)

The magic number in GZip header is not correct. Make sure you are passing in a GZip stream

To double check that I am getting properly formatted data from the web, I grab the stream and save it to a file:

byte[] byteContent = response.Content.ReadAsByteArrayAsync().Result;
File.WriteAllBytes(@"C:\\temp\1111.zip", byteContent);

After I examine 1111.zip, it appears as a well formatted zip file with the xml that I need.

I was advised here that I do not need GZipStream at all but if I remove compression stream from the code completely, and pass streamContent directly to xml reader, I get an exception:

"Data at the root level is invalid. Line 1, position 1."

Either compressed or not compressed, I still fail to parse this file. What am I doing wrong?


Solution

  • The file in question is encoded in PKZip format, not GZip format.

    You'll need a different library to decompress it, such as System.IO.Compression.ZipFile.

    You can typically tell the encoding by the file extension. PKZip files often use .zip while GZip files often use .gz.

    See: Unzip files programmatically in .net