Search code examples
cfilejoinmerge

Joining or merging files in C


I'm using Curl to download files from the internet. If a file is big, through the use of the Range header i split it in 3 chunks and i download them separately making multiple connection to the same url. Now the problem is, how do i join back those 3 different chunks into one big file? Searching on the internet all i found is merging text files using fgetc, fgets and the likes which interpret the data as text files. But my files are mostly video files or big iso files hence binary data. I looked into fwrite but how can i know which size is one element? It's just binary data. I'm confused. The Curl routine which writes to the various chunks is this: size_t write_data(void *ptr, size_t sz, size_t nmemb, FILE *stream) { size_t written = fwrite(ptr, sz, nmemb, stream); return written; } Let's say i downloaded these 3 chunks, filepath1.x, filepath2.x, filepath3.x, how can i merge them into output.mp4?


Solution

  • Since you're using shell utilities anyway, the trivial way to do this is with cat. cat file1.x file2.x file3.x > file.x.

    If you want to do this in pure C, use fopen but switched to read binary files. Text vs binary is only an issue on Windows. POSIX systems (Unix and MacOS) don't make a distinction.

    File access mode flag "b" can optionally be specified to open a file in binary mode. This flag has no effect on POSIX systems, but on Windows it disables special handling of '\n' and '\x1A'.

    If this is just a little utility program, print to stdout and use shell piping to redirect the output to a file. Just like cat.

    We read each file by allocating a fixed buffer, and reading and writing chunks into and out of that buffer. I like to use the BUFSIZ constant because that's likely to be the same as the block size of your system which makes reading more efficient. 4096 is also a good value, 4k is a common block size.

    fread and fwrite are odd. Rather than just telling them how much to read, we need to tell them to read X number of Y sized objects. This is a hold over from record-oriented filesystems and most useful when you're reading a list of fixed sized objects. Since a C string is an array of 1 byte characters, we want to read 1 byte N times. Asking to read up to the size of our buffer is: fread(buf, 1, sizeof(buf), f).

    fread returns the number of objects read. We're reading 1 byte objects, so this is equivalent to the number of bytes. We write that amount to fwrite. If BUFSIZ is 4096 bytes but the file is only 50 bytes we'll only write 50 bytes, not 50 bytes plus 4046 bytes of trash.

    #include <stdio.h>
    #include <string.h>
    #include<unistd.h>
    #include <errno.h>
    
    int main(int argc, char *argv[]) {
        // Allocate a buffer for reading.
        char buf[BUFSIZ];
    
        // Or FILE *output = fopen(output_file, "wb")
        FILE *output = stdout;
    
        // Iterate through the filenames given on the command line.
        for( int i = 1; argv[i] != NULL; i++ ) {
            // Open the file for binary reading.
            char *filename = argv[i];
            FILE *f = fopen(filename, "rb");
            if( f == NULL ) {
                fprintf(stderr, "Could not open %s: %s\n", filename, strerror(errno));
            }
    
            // Read a BUFSIZ chunk, write it to the output.
            size_t bytes_read;
            while( (bytes_read = fread(buf, 1, sizeof(buf), f)) > 0 ) {
                fwrite(buf, 1, bytes_read, output);
            }
    
            // Close the input file.
            fclose(f);
        }
    
        if( output != stdout ) {
            fclose(output);
        }
    }