I am making a method to extract information from zipped files. All the zip files will contain just one text file. It is the intend that method should return a string array.
I am using dotnetzip, but i am experiencing a horrable performance. I have tried to benchmark the performance of each step and seems to be performing slowly on all steps.
The c# code is:
public string[] LoadZipFile(string FileName)
{
string[] lines = { };
int start = System.Environment.TickCount;
this.richTextBoxLOG.AppendText("Reading " + FileName + "... ");
try
{
int nstart;
nstart = System.Environment.TickCount;
ZipFile zip = ZipFile.Read(FileName);
this.richTextBoxLOG.AppendText(String.Format("ZipFile ({0}ms)\n", System.Environment.TickCount - nstart));
nstart = System.Environment.TickCount;
MemoryStream ms = new MemoryStream();
this.richTextBoxLOG.AppendText(String.Format("Memorystream ({0}ms)\n", System.Environment.TickCount - nstart));
nstart = System.Environment.TickCount;
zip[0].Extract(ms);
this.richTextBoxLOG.AppendText(String.Format("Extract ({0}ms)\n", System.Environment.TickCount - nstart));
nstart = System.Environment.TickCount;
string filecontents = string.Empty;
using (var reader = new StreamReader(ms))
{
reader.BaseStream.Seek(0, SeekOrigin.Begin);
filecontents = reader.ReadToEnd().ToString();
}
this.richTextBoxLOG.AppendText(String.Format("Read ({0}ms)\n", System.Environment.TickCount - nstart));
nstart = System.Environment.TickCount;
lines = filecontents.Replace("\r\n", "\n").Split("\n".ToCharArray());
this.richTextBoxLOG.AppendText(String.Format("SplitLines ({0}ms)\n", System.Environment.TickCount - nstart));
}
catch (IOException ex)
{
this.richTextBoxLOG.AppendText(ex.Message+ "\n");
}
int slut = System.Environment.TickCount;
this.richTextBoxLOG.AppendText(String.Format("Done ({0}ms)\n", slut - start));
return (lines);
As an example I get this output:
Reading xxxx.zip... ZipFile (0ms) Memorystream (0ms) Extract (234ms) Read (78ms) SplitLines (187ms) Done (514ms)
A total of 514 ms. When the same operation is performed in python 2.6 using this code:
def ReadZip(File):
z = zipfile.ZipFile(File, "r")
name =z.namelist()[0]
return(z.read(name).split('\r\n'))
It executes in just 89 ms. Any ideas on how to improve performance is very welcome.
Thanks for the suggestions. I enden up changing the code in a few ways:
Removing logging and exception handling did not change performance much. I looked at sharplibs unzip library, but it looked a little more complicated to implement and from what I could read on other post there was maybe a little gain in unzipping. It is now running at around 300ms.
public List<string> LoadZipFile2(string FileName)
{
List<string> lines = new List<string>();
int start = System.Environment.TickCount;
string debugtext;
debugtext = "Reading " + FileName + "... ";
this.richTextBoxLOG.AppendText(debugtext);
try
{
//int nstart = System.Environment.TickCount;
ZipFile zip = ZipFile.Read(FileName);
// this.richTextBoxLOG.AppendText(String.Format("ZipFile ({0}ms)\n", System.Environment.TickCount - nstart));
//nstart = System.Environment.TickCount;
MemoryStream ms = new MemoryStream();
//this.richTextBoxLOG.AppendText(String.Format("Memorystream ({0}ms)\n", System.Environment.TickCount - nstart));
//nstart = System.Environment.TickCount;
zip[0].Extract(ms);
zip.Dispose();
//this.richTextBoxLOG.AppendText(String.Format("Extract ({0}ms)\n", System.Environment.TickCount - nstart));
//nstart = System.Environment.TickCount;
using (var reader = new StreamReader(ms))
{
reader.BaseStream.Seek(0, SeekOrigin.Begin);
while (reader.Peek() >= 0)
{
lines.Add(reader.ReadLine());
}
}
;
//this.richTextBoxLOG.AppendText(String.Format("Read ({0}ms)\n", System.Environment.TickCount - nstart));
}
catch (IOException ex)
{
this.richTextBoxLOG.AppendText(ex.Message + "\n");
}
int slut = System.Environment.TickCount;
this.richTextBoxLOG.AppendText(String.Format("Done ({0}ms)\n", slut - start));
return (lines);