I have 4 cores == 4 threads (omp_get_max_threads()
) so I created 4 sections (OMP sections) to break that one huge file into 4 sub-partitions so then to feed them to 4 different threads and do the further processing in each thread (other approaches were there but I wanted to try this). In each section I opened same large file independently assuming that each section will have their own independent file descriptor.
The problem is that out of 4 sections only 3 sections are writing to their 3 partition files and 1 of the section is not writing a single byte at all. And this happens randomly to any section upon different runs.
I also tried to share same file pointer among sections but the problem persisted. In addition performance deteriorated so I made file pointer in each section.
#include<iostream>
#include<omp.h>
#include<fstream>
#include<string>
#include<sstream>
using namespace std;
int main()
{
int no_of_threads = omp_get_max_threads();
int partition_size = 1000000000/no_of_threads;
long double word_length_counter;
string temp_word_holder;
word_length_counter=0;
cout<<"\nno of threads "<<no_of_threads;
#pragma omp parallel private(word_length_counter,temp_word_holder) shared(partition_size)
{
#pragma omp sections
{
#pragma omp section
{
ifstream main_file;
ofstream temp_file_holder;
main_file.open("generated.txt",ios::in);
temp_file_holder.open("partition0.txt",ios::out);
while(word_length_counter <= partition_size)
{
main_file>>temp_word_holder;
word_length_counter += temp_word_holder.length();
temp_file_holder<<temp_word_holder<<endl;
}
main_file.close();
temp_file_holder.close();
}
#pragma omp section
{
ifstream main_file1;
ofstream temp_file_holder1;
main_file1.open("generated.txt",ios::in);
temp_file_holder1.open("partition1.txt",ios::out);
main_file1.seekg(partition_size);
while(word_length_counter <= partition_size)
{
main_file1>>temp_word_holder;
word_length_counter += temp_word_holder.length();
temp_file_holder1<<temp_word_holder<<endl;
}
main_file1.close();
temp_file_holder1.close();
}
#pragma omp section
{
ifstream main_file2;
ofstream temp_file_holder2;
main_file2.open("generated.txt",ios::in);
temp_file_holder2.open("partition2.txt",ios::out);
main_file2.seekg((partition_size*2));
while(word_length_counter <= partition_size)
{
main_file2>>temp_word_holder;
word_length_counter += temp_word_holder.length();
temp_file_holder2<<temp_word_holder<<endl;
}
main_file2.close();
temp_file_holder2.close();
}
#pragma omp section
{
ifstream main_file3;
ofstream temp_file_holder3;
main_file3.open("generated.txt",ios::in);
temp_file_holder3.open("partition3.txt",ios::out);
main_file3.seekg((partition_size-1)*3);
while(word_length_counter <= partition_size && !main_file3.eof())
{
main_file3>>temp_word_holder;
word_length_counter += temp_word_holder.length();
temp_file_holder3<<temp_word_holder<<endl;
}
main_file3.close();
temp_file_holder3.close();
}
}
}
#pragma omp barrier
cout<<"\npartitions generated";
}
The culprit is that word_length_counter
is private
and not properly initialised. For each private variable, a thread-private copy is created and "... initialized, or has an undefined initial value, as if it had been locally declared without an initializer" (from the OpenMP specification, section 2.21.3). It happens so that in some threads that value may be greater than partition_size
and hence the while
-loop will never execute. Here is a simple code to reproduce the effect:
#include <cstdio>
#include <omp.h>
using namespace std;
int main()
{
long double word_length_counter;
word_length_counter=0;
#pragma omp parallel private(word_length_counter)
{
printf("%d %Lf\n", omp_get_thread_num(), word_length_counter);
}
}
Running the code:
$ clang++ -fopenmp -o foo foo.cpp
$ OMP_NUM_THREADS=4 ./foo
0 nan
1 nan
2 -nan
3 nan
$ OMP_NUM_THREADS=4 ./foo
1 nan
2 -nan
0 nan
3 nan
$ OMP_NUM_THREADS=4 ./foo
0 nan
1 nan
2 -nan
3 nan
$ OMP_NUM_THREADS=4 ./foo
2 nan
0 nan
3 nan
1 nan
As you can see, with my particular version of Clang and with 4 threads, the initial values tend to be NaN in all threads except thread 2 where it tends to be -NaN. Those are random values--whatever used to be in the memory where the thread stack is allocated--and those particular values are a system artefact.
A simple fix is to move the initialisation word_length_counter=0;
inside the parallel region or to replace
private(word_length_counter,temp_word_holder)
with
firstprivate(word_length_counter) private(temp_word_holder)
firstprivate(X)
initialises the private values of X
with the value the original variable had before the program encountered the parallel region.
Since you don't use word_length_counter
outside the parallel region, the good programming practice is to move the entire declaration inside.