I have to implement for a course assignment the Huffman encryption & decryption algorithm first in the classic way, then I have to try to make it parallel using various methods (openMP
, MPI
, phtreads
). The scope of the project is not to make it necessarily faster, but to analyze the results and talk about them and why are they like that.
The serial version works perfectly. However, for the parallel version, I stumble with a reading from file problem. In the serial version, I have a pice of code that looks like this:
char *buffer = calloc(1, MAX_BUFF_SZ);
while (bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input) > 0) {
compress_chunk(buffer, t, output);
memset(buffer, 0, MAX_BUFF_SZ);
}
This reads at most MAX_BUFF_SZ
bytes from the input file and then encrypts them. I used the memset
call for the case when bytes_read < MAX_BUFF_SZ
(maybe a cleaner solution exists though).
However, for the parallel version (using openMP for example), I want each thread to analyze only a portion of the file, but the reading to be done still in chunks. Knowing that each thread has and id thread_id
and there are at most total_threads
, I calculate the start and the end positions as following:
int slice_size = (file_size + total_threads - 1) / total_threads;
int start = slice_size * thread_id;
int end = min((thread_id + 1) * slice_size, file_size);
I can move to the start position with a simple fseek(input, start, SEEK_SET)
operation. However, I am not able to read the content in chunks. I tried with the following code (just to make sure the operation is okay):
int total_bytes = 0;
while ((bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input)) > 0) {
total_bytes += bytes_read;
if (total_bytes >= end) {
int diff = total_bytes - end;
buffer[diff] = '\0';
break;
}
fwrite(buffer, 1, bytes_read, output);
memset(buffer, 0, MAX_BUFF_SZ);
}
output
is a different file for each thread. Even when I try with just 2 threads, there are some missing characters from them. I think I am close to the right solution and I have something like an error-by-one.
So the question is: how can I read a slice of a file, but in chunks? Can you please help me identify the bug in the above code and make it work?
Edit:
If MAX_BUFF_SZ
would be bigger than the size of the input and I'll have for example 4 threads, how should a clean code look to ensure that T0
will do all the job and T1
, T2
and T3
will do nothing?
Some simple code that may be use to test the behavior is the following (note that is not from the Huffman code, is some auxiliary code to test things):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <omp.h>
#define MAX_BUFF_SZ 32
#define min(a, b) \
({ __typeof__ (a) _a = (a); \
__typeof__ (b) _b = (b); \
_a < _b ? _a : _b; })
int get_filesize(char *filename) {
FILE *f = fopen(filename, "r");
fseek(f, 0L, SEEK_END);
int size = ftell(f);
fclose(f);
return size;
}
static void compress(char *filename, int id, int tt) {
int total_bytes = 0;
int bytes_read;
char *newname;
char *buffer;
FILE *output;
FILE *input;
int fsize;
int slice;
int start;
int end;
newname = (char *) malloc(strlen(filename) + 2);
sprintf(newname, "%s-%d", filename, id);
fsize = get_filesize(filename);
buffer = calloc(1, MAX_BUFF_SZ);
input = fopen(filename, "r");
output = fopen(newname, "w");
slice = (fsize + tt - 1) / tt;
end = min((id + 1) * slice, fsize);
start = slice * id;
fseek(input, start, SEEK_SET);
while ((bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input)) > 0) {
total_bytes += bytes_read;
printf("%s\n", buffer);
if (total_bytes >= end) {
int diff = total_bytes - end;
buffer[diff] = '\0';
break;
}
fwrite(buffer, 1, bytes_read, output);
memset(buffer, 0, MAX_BUFF_SZ);
}
fclose(output);
fclose(input);
}
int main() {
omp_set_num_threads(4);
#pragma omp parallel
{
int tt = omp_get_num_threads();;
int id = omp_get_thread_num();
compress("test.txt", id, tt);
}
}
You can compile it with gcc test.c -o test -fopenmp
. You may generate a file test.txt
with some random characters, more than 32 (or change the max buffer size).
Edit 2: Again, my problem is reading a slice of a file in chunks, not the analysis per se. I know how to do that. It's an University course, I can't just say "IO bound, end of story, analysis complete".
Apparently I just had to take a pen and a paper and make a little scheme. After playing around with some indices, I came out with the following code (encbuff
and written_bits
are some auxiliary variables I use, since I am actually writing bits to a file and I use an intermediary buffer to limit the writes):
while ((bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input)) > 0) {
total_bytes += bytes_read;
if (start + total_bytes > end) {
int diff = start + total_bytes - end;
buffer[bytes_read - diff] = '\0';
compress_chunk(buffer, t, output, encbuff, &written_bits);
break;
}
compress_chunk(buffer, t, output, encbuff, &written_bits);
memset(buffer, 0, MAX_BUFF_SZ);
}
I also finished implementing the openMP version. For small files the serial one is faster, but starting from 25+MB, the parallel one starts to beats the serial one with about 35-45%. Thank you all for the advice.
Cheers!