Search code examples
mpiopenmpidma

process vm readv fails after certain number of iovec in MPI


I'm using process_vm_readv to get data from one process to the other in MPI.
I found the program will start getting trash after certain number of iovec (in this case 1024) given to process_vm_readv.
I wasn't sure what is going on, did the kernel running out of memory? Or something wrong with in my code.
Or did process_vm_readv has a upper limit for iovec?
I self-generated a vector pattern (8 bytes out of every 16 bytes) for iovec.
And the program will run until 1GB is filled with this pattern on both threads.
sbuf and rbuf have been allocated each for 1GB of memory.
And the program sits on a 24GB+ machine.

void do_test( int slen, int rlen, int scount, int rcount, void *sbuf, void *rbuf ){
int rank, err;
double timers[REP];
MPI_Win win;
pid_t pid;

MPI_Comm_rank( MPI_COMM_WORLD, &rank );
if( rank == 0 ){
    MPI_Win_create( NULL, 0, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &win );

    int send_iovcnt;
    struct iovec *send_iov;

    struct iovec *iov = malloc( sizeof(struct iovec) * scount );
    for( int p = 0; p < scount; p++ ){
        iov[p].iov_base = (char*)rbuf + p * 16;
        iov[p].iov_len = 8;
    }

    MPI_Recv( &pid, sizeof(pid_t), MPI_BYTE, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE );
    MPI_Recv( &send_iovcnt, 1, MPI_INT, 1, 1, MPI_COMM_WORLD, MPI_STATUS_IGNORE );

    send_iov = malloc( sizeof(struct iovec) * send_iovcnt );

    MPI_Recv( send_iov, sizeof(struct iovec) * send_iovcnt, MPI_BYTE, 1, 2, MPI_COMM_WORLD, MPI_STATUS_IGNORE );
    for( int i = 0; i < REP; i++ ){
        cache_flush();
        timers[i] = MPI_Wtime();
        MPI_Win_fence( 0, win );
        process_vm_readv( pid, iov, send_iovcnt, send_iov, send_iovcnt, 0 );
        MPI_Win_fence( 0, win );
        cache_flush();
        timers[i] = MPI_Wtime() - timers[i];
    }
    free(send_iov);
    free(iov);

    print_result( 8 * scount, REP, timers );
} else if( rank == 1 ){
    MPI_Win_create( sbuf, slen, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &win );

    struct iovec *iov = malloc( sizeof(struct iovec) * rcount );

    for( int p = 0; p < rcount; p++ ){
        iov[p].iov_base = (char*)sbuf + p * 16;
        iov[p].iov_base = 8;
    }

    pid = getpid();
    MPI_Send( &pid, sizeof(pid_t), MPI_BYTE, 0, 0, MPI_COMM_WORLD );
    MPI_Send( &rcount, 1, MPI_INT, 0, 1, MPI_COMM_WORLD );
    MPI_Send( iov, rcount * sizeof(struct iovec), MPI_BYTE, 0, 2, MPI_COMM_WORLD );
    for( int i = 0; i < REP; i++ ){
        cache_flush();
        MPI_Win_fence( 0, win );
        MPI_Win_fence( 0, win );
    }
    free(iov);

}

Solution

  • Found in the man page of process_vm_readv(2) is the following text:

    The values specified in the liovcnt and riovcnt arguments must be less than or equal to IOV_MAX (defined in <limits.h> or accessible via the call sysconf(_SC_IOV_MAX)).

    On my Linux system, the value of IOV_MAX (ultimately defined in /usr/include/x86_64-linux-gnu/bits/uio_lim.h) is 1024.