Search code examples
mpisigpipesigbus

Caught a fatal signal: SIGBUS(7) on node 2/32


I'm trying to run NAS-UPC benchmarks on a 32 node cluster.

It works fine in cases where the problem size is small . When I graduate to a bigger problem size (CLASS D), I get this error (for MG benchmark)

*** Caught a fatal signal: SIGBUS(7) on node 2/32
 p4_error: latest msg from perror: Bad file descriptor
*** Caught a signal: SIGPIPE(13) on node 0/32
    p4_error: latest msg from perror: Bad file descriptor
   p4_error: latest msg from perror: Bad file descriptor

*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** Caught a signal: SIGPIPE(13) on node 27/32
*** Caught a signal: SIGPIPE(13) on node 20/32
*** Caught a signal: SIGPIPE(13) on node 21/32
    p4_error: latest msg from perror: Bad file descriptor
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** Caught a signal: SIGPIPE(13) on node 16/32
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit

Can anybody explain why this is happening , And if anyone has seen this error before and fixed it ?

EDIT : Figured out it is a memory related problem . But I'm unable to allott right amount of memory for application at compile time


Solution

  • I figured it is a problem with benchmark needing more memory than i had allotted it during compile time.