Search code examples
c++amazon-web-servicesmpiopenmpopenmpi

MPI_Bcast results in segmentation fault


Context

I've been working on a university project which requires the use of both OpenMp and MPI to examine a .csv file (of 1 million lines) and extract some statistics.

I've managed to write and test a program that works fine on my machine, so I opened an AWS account to tests the actual parallel performance of the program when it runs on multiple nodes.

Problem

When I run the code on a single AWS EC2 (Amazon Linux 2, t2.xlarge) instance, I get a segmentation fault that is due to the MPI_BCast call, even though the code worked fine on my machine. I really want to understand this problem before extending its execution to other nodes.

I've narrowed down the code that produces this error to the following:


#define FIN_PATH R"(NYPD_Motor_Vehicle_Collisions.csv)"
#define LINES 955928
#define MAX_LINE_LENGHT 500

...

int main() {
    int RANK;
    int SIZE;
    int THREAD_SUPPORT;

    MPI_Init_thread(nullptr, nullptr, MPI_THREAD_FUNNELED, &THREAD_SUPPORT);

    MPI_Comm_size(MPI_COMM_WORLD, &SIZE);
    MPI_Comm_rank(MPI_COMM_WORLD, &RANK);

    int i;

    //Initialize empty dataset
    char ** data = new char*[LINES];

    for(i = 0; i < LINES; ++i)
        data[i] = new char[MAX_LINE_LENGHT] {'\0'};


    // Populate dataset
    if (RANK == 0) {
        string line;
        ifstream fin(FIN_PATH, ios::in);

        getline(fin, line);

        for(i = 0; i < LINES; ++i) {
            getline(fin, line);
            normalize(&line);
            line.copy(data[i], line.size() + 1);
        }
        fin.close();
    }


    // Broadcast dataset to each process
    MPI_Bcast(&data[0][0], LINES * MAX_LINE_LENGHT, MPI_CHAR, 0, MPI_COMM_WORLD);


    MPI_Finalize();
    return 0;
}


As you can see, I'm reading the file and saving every char into a 2d array (because MPI cannot handle strings) which I then broadcast to every process (not elegant, but it helps me avoid major headaches down the road, e.g.: implementing MPI I/O).

I don't get any error when reading the file, only when I reintroduce the MPI_Bcast function.

I made sure that the AWS EC2 Instance had enough RAM (16Gb) so I don't see why this segmentation fault would occur, and I'm new to everything (except c++ programming) so I don't have the tools to debug an error like this yet. Any insight is appreciated!


Solution

  • for(i = 0; i < LINES; ++i)
        data[i] = new char[MAX_LINE_LENGHT] {'\0'};
    

    You have allocated a whole bunch of individual strings at different addresses.

    MPI_Bcast(&data[0][0], LINES * MAX_LINE_LENGHT, MPI_CHAR, 0, MPI_COMM_WORLD);
    

    But you told MPI_Bcast to send a single object of size LINES * MAX_LINE_LENGHT. Nowhere did you create an object like that.

    You need to allocate a single object at a single address containing all the data you want to send because that's what MPI_Bcast expects.