I have a process master that spawns N child processes that communicate with the parent through unnamed pipes. I must be able to:
My problem does not concern the O.S. concepts, only the file operations :S
Perhaps fseek? I can't mmap the log file (some have more than 1GB).
I would appreciate some ideas. Thank you in advance
EDIT: I'm trying to make the children read the respective lines without using fseek and the value of chunks, so, could someone please tell me if this is valid? :
//somewhere in the parent process:
FILE* logFile = fopen(filename, "r");
while (fgets(line, 1024, logFile) != NULL) {
num_lines++;
}
rewind(logFile);
int prev = 0;
for (i = 0; i < maps_nr; i++) {
struct send_to_Map request;
request.fp = logFile;
request.lower = lowLimit;
request.upper = highLimit;
if (i == 0)
request.minLine = 0;
else
request.minLine = 1 + prev;
if(i!=maps_nr-1)
request.maxLine = (request.minLine + num_lines / maps_nr) - 1;
else
request.maxLine = (request.minLine + num_lines / maps_nr)+(num_lines%maps_nr);
prev = request.maxLine;
}
//write this structure to respective pipe
//child process:
while(1) {
...
//reads the structure to pipe (and knows which lines to read)
int n=0, counter=0;
while (fgets(line, 1024, logFile) != NULL){
if (n>=minLine and n<=maxLine)
counter+= process(Line);//returns 1 if IP was found, in that line, between the low and high limit
n++;
}
//(...)
}
I don't know if it's going to work, I just to make it work! Even like this, is it possible to outperform a single process reading the whole file and printing the total number of ips found in the log file?
If you don't care about dividing the file exactly evenly, and the distribution of line lengths is somewhat even over the entire file, you can avoid reading the entire file in the parent once.
start
and reads chunk_size
bytes.That's a rough sketch of the strategy.
Edited to simplify things a bit.
Edit: here's some untested code for step 3, and step 4 below. This is all untested, and I haven't been careful about off-by-one errors, but it gives you an idea of the usage of fseek
and ftell
, which sounds like what you are looking for.
// Assume FILE* f is open to the file, chunk_size is the average expected size,
// child_num is the id of the current child, spawn_child() is a function that
// handles the logic of spawning a child and telling it where to start reading,
// and how much to read. child_chunks[] is an array of structs to keep track of
// where the chunks start and how big they are.
if(fseek(f, child_num * chunk_size, SEEK_SET) < 0) { handle_error(); }
int ch;
while((ch = fgetc(f)) != FEOF && ch != '\n')
{/*empty*/}
// FIXME: needs to handle EOF properly.
child_chunks[child_num].end = ftell(f); // FIXME: needs error check.
child_chunks[child_num+1].start = child_chunks[child_num].end + 1;
spawn_child(child_num);
Then in your child (step 4), assume the child has access to child_chunks[]
and knows its child_num
:
void this_is_the_child(int child_num)
{
/* ... */
fseek(f, child_chunks[child_num].start, SEEK_SET); // FIXME: handle error
while(fgets(...) && ftell(f) < child_chunks[child_num].end)
{
}
}