I am trying to recursively calculate SHA256 sum of all files in directory using OpenSSL.
This is my code:
#include <stdlib.h>
#include <stdio.h>
#include <dirent.h>
#include <string.h>
#include <openssl/sha.h>
#include <openssl/md5.h>
#define _MAX_LINE_ 256
int sha256_file (char* path, char output[65]){
FILE* file = fopen(path, "rb");
unsigned char hash[SHA256_DIGEST_LENGTH];
const int bufSize = 32768;
char* buffer = malloc(bufSize);
int bytesRead = 0;
SHA256_CTX sha256;
if(!file)
return -1;
if(!buffer)
return -1;
SHA256_Init(&sha256);
while((bytesRead = fread(buffer, 1, bufSize, file))){
SHA256_Update(&sha256, buffer, bytesRead);
}
SHA256_Final(hash, &sha256);
sha256_hash_string(hash, output);
fclose(file);
free(buffer);
return 0;
}
void sha256_hash_string (unsigned char hash[SHA256_DIGEST_LENGTH], char outputBuffer[65]){
int i = 0;
for(i = 0; i < SHA256_DIGEST_LENGTH; i++){
sprintf(outputBuffer + (i * 2), "%02x", (unsigned char)hash[i]);
}
outputBuffer[64] = 0;
}
void traverse_dirs(char* base_path){
char path[_MAX_LINE_];
struct dirent* dp;
DIR* dir = opendir(base_path);
unsigned char file_sha[65];
char* md5_command;
if(!dir)
return;
while((dp = readdir(dir)) != NULL){
if(strcmp(dp->d_name, ".") != 0 && strcmp(dp->d_name, "..") != 0){
// calculate the sha256 sum of the file
sha256_file(dp->d_name, file_sha);
// print the name of the file followed by the sha256 sum
printf("%s -> %s\n", dp->d_name, file_sha);
strcpy(path, base_path);
strcat(path, "/");
strcat(path, dp->d_name);
traverse_dirs(path);
}
}
closedir(dir);
}
int main(int argc, char* argv[]){
if(argc < 2){
printf("Usage: <executable> <dirname>\n");
exit(-1);
}
traverse_dirs(argv[1]);
return 0;
}
The sha256_file()
function produces the correct sha256sum for each file, as I have tested manually.
The traverse_dirs()
function also works fine, as it correctly prints the contents of the directory provided.
The problem is they don't work together. I have figured out that the file is not opening correctly in the sha256_file()
function (fopen returns NULL) but I don't get why. If I use it manually on every file, it works just fine.
Any ideas why?
This sha256_file(dp->d_name, file_sha)
won't work because you are not in the directory that contains that name. You need to instead use the path that you construct in path[]
.
You should only be calling sha256_file(path)
if path
is a regular file or a symbolic link (if you want to process symbolic links — your call), only calling traverse_dirs(path)
if path
is a directory, or doing nothing with the entry otherwise. You can check for those using d_type
. See the man page for dirent.
For efficiency, you could have just one path[]
that gets passed through the traverse()
calls, appending to path[]
for each entry. That would use much less stack space, and would be faster as well with much less copying. You would also allocate file_sha[]
only in the block that computes the SHA-256, so you're not wasting recursing stack space on that either.
Something like:
void traverse_dirs(char* path, size_t end) {
DIR* dir = opendir(path);
if (dir == NULL)
return;
path[end] = '/';
struct dirent* dp;
while ((dp = readdir(dir)) != NULL) {
if (dp->d_name[0] == '.' && (dp->d_name[1] == 0 ||
(dp->d_name[1] == '.' && dp->d_name[2] == 0)))
continue;
strcpy(path + end + 1, dp->d_name);
if (dp->d_type == DT_REG || dp->d_type == DT_LNK) {
unsigned char file_sha[65];
sha256_file(path, file_sha);
printf("%s -> %s\n", path, file_sha);
}
else if (dp->d_type == DT_DIR)
traverse_dirs(path, end + 1 + strlen(dp->d_name));
}
closedir(dir);
}
which would be called initially with *path
having enough space for the maximum path size plus one, and containing the path to open as a directory. The second argument would be the length of the path, or zero if the path is the root, "/"
(to avoid double slashes). You should get the maximum path length from limits.h as PATH_MAX
, assuming your system is POSIX compliant. Or if not, 32K should be safe. 256 isn't.
If your struct dirent
has d_namlen
, then you can use that instead of the strlen()
. Or if you have stpcpy()
, you can use that instead of strcpy()
and compute the end from its return value. You can also consider guarding against writing past the end of path[]
. If you don't mind globals, path
could be a global and not pass the same pointer unnecessarily in every call.
Move your malloc()
after the first return
, and it if it fails, close the open file before returning. Or just allocate the 32K on the stack. That's a small amount for the stack.