Search code examples
cdirectoryopenssl

Recursively calculate SHA256 sum of all files in directory using OpenSSL


I am trying to recursively calculate SHA256 sum of all files in directory using OpenSSL.

This is my code:

#include <stdlib.h>
#include <stdio.h>
#include <dirent.h>
#include <string.h>
#include <openssl/sha.h>
#include <openssl/md5.h>

#define _MAX_LINE_ 256

int sha256_file (char* path, char output[65]){

    FILE* file = fopen(path, "rb");
    unsigned char hash[SHA256_DIGEST_LENGTH];
    const int bufSize = 32768;
    char* buffer = malloc(bufSize);
    int bytesRead = 0;
    SHA256_CTX sha256;

    if(!file)
        return -1;
        
    if(!buffer)
        return -1;

    SHA256_Init(&sha256);

    while((bytesRead = fread(buffer, 1, bufSize, file))){
        SHA256_Update(&sha256, buffer, bytesRead);
    }

    SHA256_Final(hash, &sha256);

    sha256_hash_string(hash, output);

    fclose(file);
    free(buffer);

    return 0;
}      

void sha256_hash_string (unsigned char hash[SHA256_DIGEST_LENGTH], char outputBuffer[65]){
    int i = 0;

    for(i = 0; i < SHA256_DIGEST_LENGTH; i++){
        sprintf(outputBuffer + (i * 2), "%02x", (unsigned char)hash[i]);
    }

    outputBuffer[64] = 0;
}


void traverse_dirs(char* base_path){

    char path[_MAX_LINE_];
    struct dirent* dp;
    DIR* dir = opendir(base_path);
    unsigned char file_sha[65];
    char* md5_command;

    if(!dir)
        return;

    while((dp = readdir(dir)) != NULL){
        if(strcmp(dp->d_name, ".") != 0 && strcmp(dp->d_name, "..") != 0){
            
            // calculate the sha256 sum of the file
            sha256_file(dp->d_name, file_sha);

            // print the name of the file followed by the sha256 sum
            printf("%s -> %s\n", dp->d_name, file_sha);

            strcpy(path, base_path);
            strcat(path, "/");
            strcat(path, dp->d_name);

            traverse_dirs(path);

        }
    }

    closedir(dir);

}

int main(int argc, char* argv[]){

    if(argc < 2){
        printf("Usage: <executable> <dirname>\n");
        exit(-1);
    }
        
    traverse_dirs(argv[1]);

    return 0;

}

The sha256_file() function produces the correct sha256sum for each file, as I have tested manually.

The traverse_dirs() function also works fine, as it correctly prints the contents of the directory provided.

The problem is they don't work together. I have figured out that the file is not opening correctly in the sha256_file() function (fopen returns NULL) but I don't get why. If I use it manually on every file, it works just fine.

Any ideas why?


Solution

  • This sha256_file(dp->d_name, file_sha) won't work because you are not in the directory that contains that name. You need to instead use the path that you construct in path[].

    You should only be calling sha256_file(path) if path is a regular file or a symbolic link (if you want to process symbolic links — your call), only calling traverse_dirs(path) if path is a directory, or doing nothing with the entry otherwise. You can check for those using d_type. See the man page for dirent.

    For efficiency, you could have just one path[] that gets passed through the traverse() calls, appending to path[] for each entry. That would use much less stack space, and would be faster as well with much less copying. You would also allocate file_sha[] only in the block that computes the SHA-256, so you're not wasting recursing stack space on that either.

    Something like:

    void traverse_dirs(char* path, size_t end) {
        DIR* dir = opendir(path);
        if (dir == NULL)
            return;
        path[end] = '/';
        struct dirent* dp;
        while ((dp = readdir(dir)) != NULL) {
            if (dp->d_name[0] == '.' && (dp->d_name[1] == 0 ||
                 (dp->d_name[1] == '.' && dp->d_name[2] == 0)))
                continue;
            strcpy(path + end + 1, dp->d_name);
            if (dp->d_type == DT_REG || dp->d_type == DT_LNK) {
                unsigned char file_sha[65];
                sha256_file(path, file_sha);
                printf("%s -> %s\n", path, file_sha);
            }
            else if (dp->d_type == DT_DIR)
                traverse_dirs(path, end + 1 + strlen(dp->d_name));
        }
        closedir(dir);
    }
    

    which would be called initially with *path having enough space for the maximum path size plus one, and containing the path to open as a directory. The second argument would be the length of the path, or zero if the path is the root, "/" (to avoid double slashes). You should get the maximum path length from limits.h as PATH_MAX, assuming your system is POSIX compliant. Or if not, 32K should be safe. 256 isn't.

    If your struct dirent has d_namlen, then you can use that instead of the strlen(). Or if you have stpcpy(), you can use that instead of strcpy() and compute the end from its return value. You can also consider guarding against writing past the end of path[]. If you don't mind globals, path could be a global and not pass the same pointer unnecessarily in every call.

    Move your malloc() after the first return, and it if it fails, close the open file before returning. Or just allocate the 32K on the stack. That's a small amount for the stack.