So I am trying to identify the root hash for a text file by first calculating the SHA1 hashes for 64-byte lines, concatenating them and again finding the hash for the concatenated hash. My overall process is something like this,
Read the file in 64-byte lines > Hash each line and write to a file[hashes.txt] > concatenate hashes two at a time and write to another file[temp_hashes.txt] > Hash the temporary, concatenated hashes and write back to [hashes.txt].
I repeat this process until the length of [hashes.txt] is one. Finally, I write this to my permanent record [secure.txt].
I am using the library . I've used two text files for testing, let's call them [one.txt] and [two.txt]. Both have some excerpts from lorem ipsum. Now everything seems fine till the first 64-byte line hashing step, but as soon as I combine it, the root hash becomes unique every time I run the code. I have tried emptying both [hashes.txt] and [temp_hashes.txt] and re-running.
This is my first hash step.
char buf[64];
unsigned char all_hashes[TABLE_SIZE][21];
unsigned char md[SHA_DIGEST_LENGTH];
while (fgets(buf, sizeof(buf), fptr) != NULL){
get_sha1_hash(buf, sizeof(buf), md);
for(int i = 0; i < SHA_DIGEST_LENGTH; i++)
fprintf(outfile, "%02x", md[i]);
fprintf(outfile, "\n");
}
The combining is something like this
char * temp = malloc(sizeof(char)*100);
char * line = malloc(sizeof(char)*100);
int k = 0;
while (fgets(line, 100, file) != NULL) {
line[strlen(line)-1] = '\0';
if (k%2 == 0) {
fprintf(outfile, "%s", line);
}
else {
fprintf(outfile, "%s\n", line);
}
k++;
}
And this is the re-hash step
char line[1024]; // I guess the same as char line[100]
int i = 0;
unsigned char md[SHA_DIGEST_LENGTH];
while(fgets(line, sizeof(line), infile) != NULL) {
get_sha1_hash(line, sizeof(line), md);
for(int i = 0; i<SHA_DIGEST_LENGTH; i++)
fprintf(outfile, "%02x", md[i]);
fprintf(outfile, "%s", "\n");
}
Finally, everything comes together like this
while(calculate_length_of_file("hashes.txt") > 1) {
combine_hashes_by_two();
hash_file_line_by_line();
}
I am just starting out with C and have made trivial memory mistakes before, I think it must be something simple here too, just can't seem to crack it.
Any and all help will be greatly appreciated, thank you!
The problem is:
Here, you read a line into the buffer buf[64]
:
while (fgets(buf, sizeof(buf), fptr) != NULL){
Here, you hash the complete buffer:
get_sha1_hash(buf, sizeof(buf), md);
but fgets()
might not have read the whole buffer in; it only reads until the next newline!
So, probably you meant to hash:
get_sha1_hash(buf, strlen(buf), md);
Otherwise, you also hash some uninitialized content at the end of buf
, which leads to (pseudo-)random results.