Search code examples
c#archive

How to merge a huge number of small files into one file


I have a about 1.5 million small files, with a total size of about 80 GB.

I want to merge these files into one file to be fast to copy. I tried to archive them into a zip file and read the the files with this code:

ZipFile zip = ZipFile.Read(Settings.Default.DataPath); 
ZipEntry entery = zip[MyFile];

The idea worked, but it too slow, it took about 30 seconds to load one file.

Is there any another faster idea to merge the files?

Thanks


Solution

  • One way would be to use SQLite (you can add it through a Nuget package) and create a database file that holds all these individual bits of data.

    You would create a table that holds all the files and make the filename the primary key, which would automatically create an index on it:

    CREATE TABLE files
    (
        filename TEXT NOT NULL PRIMARY KEY,
        content BLOB
    )
    

    You would then insert all the files into it, one row per file.

    To retrieve it, you would execute SQL like this:

    SELECT content FROM files WHERE filename = ?
    

    I would encapsulate all of this into a new class so that you separate out the functionality of maintaining and using this file from the rest of your application.