Search code examples
.netvb.netfilebytefilestream

File splitter method goes wrong when working with decimals


I'm trying to develop a file-splitter method that splits a file into chunks of the desired size, it works perfect for files that has even filesize values (eg: if filesize is 2097152 bytes and I want to split it into two chunks, first chunk is 1048576 bytes and second chunk is 1048576 bytes),

the problem is when I try to split a file that when I divide its filesize it has decimals, for example I want to split a file of 8194321 bytes in two (or whatever) chunks, the half filesize is 4097160,5 bytes but as I need to use integers then I set chunk size to 4097161 bytes to create two chunks, the first chunk of 4097161 bytes and the second chunk of 4097160 bytes, but when I try split the file, when working the last chunk I get a System.ArgumentException exception on this instruction:

outputStream.Write(buffer, bufferLength * bufferCount, tmpBufferLength)

with this error message:

Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection.

How I can fix my file-splitter method to properly split a file which has decimals when divided?

This is an usage example:

Split(sourceFile:=Me.fileToSplit,
      chunkSize:=CInt(New FileInfo(fileToSplit).Length / 2),
      chunkName:="File.Part",
      chunkExt:="fs")

This the relevant code of the file-splitter procedure:

''' <summary>
''' Splits a file into manageable chunks.
''' </summary>
''' <param name="sourceFile">The file to split.</param>
''' <param name="chunkSize">The size per chunk.</param>
''' <param name="chunkName">The name formatting for chunks.</param>
''' <param name="chunkExt">The file-extension for chunks.</param>
Public Sub Split(ByVal sourceFile As String,
                 ByVal chunkSize As Integer,
                 ByVal chunkName As String,
                 ByVal chunkExt As String)

    ' FileInfo instance of the source file.
    Dim fInfo As New FileInfo(sourceFile)

    ' The total filesize to split, in bytes.
    Dim totalSize As Long = fInfo.Length

    ' The remaining size to calculate the percentage, in bytes.
    Dim sizeRemaining As Long = totalSize

    ' Counts the length of the current chunk file to calculate the percentage, in bytes.
    Dim sizeWritten As Long = 0L

    ' The buffer to read data and write the chunks.
    Dim buffer As Byte() = New Byte() {}

    ' The buffer length.
    Dim bufferLength As Integer = 524288 ' 512 Kb

    ' The total amount of chunks to create.
    Dim chunkCount As Long = CLng(Math.Ceiling((fInfo.Length - bufferLength) / (chunkSize)))

    ' Keeps track of the current chunk.
    Dim chunkIndex As Long = 0L

    ' A zero-filled string to enumerate the chunk parts.
    Dim enumeration As String = String.Empty

    ' The chunks filename.
    Dim chunkFilename As String = String.Empty

    ' Open the file to start reading bytes.
    Using inputStream As New FileStream(fInfo.FullName, FileMode.Open)

        Using binaryReader As New BinaryReader(inputStream)

            While (inputStream.Position < inputStream.Length)

                chunkIndex += 1L 'Increment the chunk file counter.

                ' Set chunk filename.
                enumeration = New String("0"c, CStr(chunkCount).Length - CStr(chunkIndex).Length)
                chunkFilename = String.Format("{0}.{1}.{2}", chunkName, enumeration & CStr(chunkIndex), chunkExt)

                ' Reset written byte-length counter.
                sizeWritten = 0L

                ' Create the chunk file to Write the bytes.
                Using outputStream As New FileStream(chunkFilename, FileMode.Create)

                    ' Read until reached the end-bytes of the input file.
                    While (sizeWritten < chunkSize) AndAlso (inputStream.Position < inputStream.Length)

                        ' Read bytes from the source file.
                        buffer = binaryReader.ReadBytes(chunkSize)

                        Dim bufferCount As Integer = 0
                        Dim tmpBufferLength As Integer = bufferLength

                        While (sizeWritten < chunkSize)

                            If (bufferLength + (bufferLength * bufferCount)) >= chunkSize Then
                                tmpBufferLength = chunkSize - ((bufferLength * bufferCount))
                            End If

                            ' Write those bytes in the chunk file.
                            outputStream.Write(buffer, bufferLength * bufferCount, tmpBufferLength)

                            bufferCount += 1

                            ' Increment the bytes-written counter.
                            sizeWritten += tmpBufferLength

                            ' Decrease the bytes-remaining counter.
                            sizeRemaining -= tmpBufferLength

                            ' Reset the temporal buffer length.
                            tmpBufferLength = bufferLength

                        End While

                    End While ' (sizeWritten < chunkSize) AndAlso (inputStream.Position < inputStream.Length)

                    outputStream.Flush()

                End Using ' outputStream

            End While ' inputStream.Position < inputStream.Length

        End Using ' binaryReader

    End Using ' inputStream

End Sub

EDIT: I forgot to mention that the While (sizeWritten < chunkSize) block is because inside that block I trigger some events, instead of writting the entire buffer at once I use that while loop to "slowly" write the other buffer, this way I split files at exact size except for files with filesize that when divided has decimals, then throws that exception I mentioned.


Solution

  • You need to calculate the right amount to read before actually reading. Right now, you always read chunkSize bytes but you are sometimes discarding the tail of that buffer.

    I think the intention is that tmpBufferLength has the correct buffer length. Assuming that works out (which I am too lazy to verify...) read exactly that amount from the source and then write the entire buffer to the destination.