Search code examples
arraysmpifree

MPI vs Sequential code - Issue with free arrays


I have a strange result between sequential and MPI version of a small code that computes values on a grid.

The sequential version looks like :

int main() {

   /* Array */
   double **x;
 
   /* Allocation of 2D arrays */
   x = malloc(size_tot_y*sizeof(*x));

   for (i=0;i<=size_tot_y-1;i++) {
      x[i] = malloc(size_tot_x*sizeof(**x));
   }

   /* Do various computations */

   /* End of code */

   /* Free all arrays */
   for (i=0;i<=size_tot_y-1;i++) {
      free(x[i]);
   }
   free(x);

   return 0;

}

This sequential version is working fine and all arrays (x, x0) seems to be free in a right way.

Now, If I take the MPI version, which looks like :

 int main() {
    
   /* Array */
   double **x;
   double *xfinal;
    
   /* Allocate size_tot_y rows */
   x = malloc(size_tot_y*sizeof(*x));

   /* Allocate 2D Contiguous arrays for x */
   x[0] = malloc(size_tot_x*size_tot_y*sizeof(**x));

   /* Loop on rows */
   for (j=1;j<size_tot_y;j++) {
    /* Increment size_tot_y block on x[i] and x0[i] address */
    x[j] = x[0] + j*size_tot_x;
   }

       /* Do various computations */
    
       /* End of MPI code */
    
   /* Free all arrays */
   for (i=0;i<=size_tot_y-1;i++) {
      free(x[i]);
   }
   free(x);

   return 0;

   }

I get at the execution the following error :

[machine1:04130] *** Process received signal ***
[machine1:04130] Signal: Segmentation fault (11)
[machine1:04130] Signal code: Address not mapped (1)
[machine1:04130] Failing at address: 0x7f179c020838
[machine1:04131] *** Process received signal ***
[machine1:04131] Signal: Segmentation fault (11)
[machine1:04131] Signal code: Address not mapped (1)
[machine1:04131] Failing at address: 0x7ff0b417c838
[machine1:04132] *** Process received signal ***
[machine1:04132] Signal: Segmentation fault (11)
[machine1:04132] Signal code: Address not mapped (1)
[machine1:04132] Failing at address: 0x7f8560001838
[machine1:04133] *** Process received signal ***
[machine1:04133] Signal: Segmentation fault (11)
[machine1:04133] Signal code: Address not mapped (1)
[machine1:04133] Failing at address: 0x7f22f415f838
[machine1:04134] *** Process received signal ***
[machine1:04140] *** Process received signal ***
   
          [machine1:04134] Signal: Segmentation fault (11)
          [machine1:04134] Signal code: Address not mapped (1)
          [machine1:04134] Failing at address: 0x7f4e3c0d3838
          [machine1:04142] *** Process received signal ***
          [machine1:04142] Signal: Segmentation fault (11)
          [machine1:04142] Signal code: Address not mapped (1)
          [machine1:04142] Failing at address: 0x7ff0d4064838
          [machine1:04140] Signal: Segmentation fault (11)
          [machine1:04140] Signal code: Address not mapped (1)
          [machine1:04140] Failing at address: 0x7fb2941c3838
          [machine1:04129] *** Process received signal ***
          [machine1:04129] Signal: Segmentation fault (11)
          [machine1:04129] Signal code: Address not mapped (1)
          [machine1:04129] Failing at address: 0x7f9150049838
          [machine1:04142] [machine1:04134] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f4e48e55890]
          [machine1:04134] [machine1:04129] [ 0] [machine1:04130] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x[machine1:04131] [ 0] [machine1:04132] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0([machine1:04140] [ 1] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0/lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f91550a8890]
          [machine1:04129] [ 1] f890)[0x7f179f424890]
          [machine1:04130] [ 1] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7ff0b777e890]
          [machine1:04131] [ 1] [machine1:04133] [ 0] +0xf890)[0x7f8564847890]
          [machine1:04132] [ 1] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7f4e48b17614]
          [machine1:04134] (+0xf890)[0x7fb2979c7890]
          /lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7f179f0e6614]
          [machine1:04130] [ 2] ./explicitPar[0x401c48]
          /lib/x86_64-linux-gnu/libpthread.so.0[ 2] ./explicitPar[0x401c48]
          [machine1:04134] [ 3] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7f8564509614]
          [machine1:04132] (+0xf890/lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7f9154d6a614]
          [machine1:04129] [machine1:04140] [ 1] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7ff0b7440614]
          [machine1:04131] [machine1:04130] [ 3] /lib/x86_64-linux-gnu/libc.so.6(/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x[ 2] ./explicitPar[0x401c48]
          [machine1:04132] [ 3] [ 2] ./explicitPar[0x401c48]
          [machine1:04129] [ 3] [ 2] ./explicitPar[0x401c48]
          [machine1:04131] [ 3] __libc_start_main+0xf5)[0x7f179f08bb45]
          [machine1:04130] [ 4] ./explicitPar[0x400e49]
          [machine1:04130] *** End of error message ***
          f5)[0x7f4e48abcb45]
          [machine1:04134] )[0x7f22f8bb2890]
          [machine1:04133] /lib/x86_64-linux-gnu/libc.so.6/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7ff0b73e5b45[ 4] ./explicitPar[0x400e49]
          /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[ 1] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f9154d0fb45]
          [machine1:04129] ]
          [machine1:04131] [ 4] ./explicitPar[0x7f85644aeb45]
          [machine1:04132] /lib/x86_64-linux-gnu/libc.so.6(cfree[ 0] [ 4] ./explicitPar[0x400e49]
          [machine1:04129] *** End of error message ***
          (cfree+0x14)[0x7fb297689614]
          [machine1:04140] [ 2] ./explicitPar[0x401c48[machine1:04134] *** End of error message ***
          [0x400e49]
          [machine1:04131] *** End of error message ***
          [ 4] ./explicitPar[0x400e49]
          [machine1:04132] *** End of error message ***
          +0x14)[0x7f22f8874614]
          [machine1:04133] ]
          [machine1:04140] [ 3] [ 2] ./explicitPar/lib/x86_64-linux-gnu/libc.so.6[0x401c48]
          [machine1:04133] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fb29762eb45]
          [machine1:04140] [ 4] (__libc_start_main+0xf5)[0x./explicitPar[0x7f22f8819b45]
          [machine1:04133] 400e49]
          [machine1:04140] *** End of error message ***
          [ 4] ./explicitPar[0x400e49]
          [machine1:04133] *** End of error message ***
          /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7ff0d9907890]
          [machine1:04142] [ 1] --------------------------------------------------------------------------
          mpirun noticed that process rank 1 with PID 0 on node machine1 exited on signal 11 (Segmentation fault).

If I simply do to free arrays :

   free(x);
   

i.e, I have commented here the part :

/*for (i=0;i<=size_tot_y-1;i++) {
      free(x[i]);      
   }
 */

Then, I don't get error like above : so the issue comes from the way to free arrays in MPI code version.

Why the second expression to free arrays is not good ? I would have thought that the way to free them was the same in both cases but it seems that not.


Solution

  • Array allocation and de-allocation must be symmetrical.

    You did declare your 2D arrays as a double **, so these are really arrays of pointers pointing to arrays of double. In the sequential version, you issued one malloc() for the columns, and then one malloc() per row. your rows will not be in contiguous memory, but this is just fine.

    This approach is generally not valid with MPI because you likely pass your 2D array to some MPI functions that expect a contiguous data layout. So you issued one malloc() for the columns (nothing changed so far), and then one single malloc() for all the rows. And then you constructed the first allocated array with pointers to the second one. As a consequence, you must issue only two free() when deallocating a 2D array.

    So the correct way of deallocating the x array is

    free(x[0]);
    free(x);