Search code examples
c#arraysfilestream

What is different with the writing in FileStream?


When I searched the method about decompress the file by using SharpZipLib, I found lot of methods like this:

public static void TarWriteCharacters(string tarfile, string targetDir)
{
    using (TarInputStream s = new TarInputStream(File.OpenRead(tarfile)))
    {
        //some codes here

        using (FileStream fileWrite = File.Create(targetDir + directoryName + fileName))
        {                          
            int size = 2048;
            byte[] data = new byte[2048];
            while (true)
            {
                size = s.Read(data, 0, data.Length);
                if (size > 0)
                {
                    fileWrite.Write(data, 0, size);
                }
                else
                {
                    break;
                }
            }
            fileWrite.Close();
        }
    }
}

The format FileStream.Write is:

FileStream.Write(byte[] array, int offset, int count)

Now I try to separate part of read and write because I want to use thread to speed up the decompress rate in write function, and I use dynamic array byte[] and int[] to deposit the file's data and size like below

Read:

public static void TarWriteCharacters(string tarfile, string targetDir)
{
    using (TarInputStream s = new TarInputStream(File.OpenRead(tarfile)))
    {
        //some codes here

        using (FileStream fileWrite= File.Create(targetDir + directoryName + fileName))
        {                          
            int size = 2048;

            List<int> SizeList = new List<int>();
            List<byte[]> mydatalist = new List<byte[]>();

            while (true)
            {
                byte[] data = new byte[2048];
                size = s.Read(data, 0, data.Length);

                if (size > 0)
                {
                    mydatalist.Add(data);
                    SizeList.Add(size);
                }
                else
                {
                    break;
                }
            }
            test = new Thread(() =>
                FileWriteFun(pathToTar, args, SizeList, mydatalist)
            );
            test.Start();
            streamWriter.Close();
        }
    }
}

Write:

public static void FileWriteFun(string pathToTar , string[] args, List<int> SizeList, List<byte[]> mydataList)
{
    //some codes here

    using (FileStream fileWrite= File.Create(targetDir + directoryName + fileName))
    {
        for (int i = 0; i < mydataList.Count; i++)
        {
            fileWrite.Write(mydataList[i], 0, SizeList[i]);
        }
        fileWrite.Close();
    }
}

Edit

(1)byte[] data = new byte[2048] into while loop to assign data to new array.

(2)change int[] SizeList = new int[2048] to List<int> SizeList = new List<int>() because of int range


Solution

  • As read on a stream is only guarantied to return one byte (typically it will be more, but you can't rely on the full requested length each time), your solution can theoretically fail after 2048 bytes as your SizeList can only hold 2048 entries.

    You could use a List to hold the sizes.

    Or use a MemoryStream instead of inventing your own.

    But the two main problems are: 1) You keep reading into the same byte array, overwriting previously read data. When you add your data byte array to mydatalist, you must assign data to a new byte array. 2) you close your stream before the second thread is done writing.

    In general threading is difficult and should only be used where you know it will improve performance. Simply reading and writing data is typically IO bound in performance, not cpu bound, so introducing a second thread will just give a small performance penalty and no gain in speed. You could use multithreading to ensure concurrent read/write operations, but most likely the disk cache will do this for you if you stick to the first solution - amd if not, using async is easier than multithreaded to achieve this.