I'm trying to run NAS-UPC benchmarks on a 32 node cluster.
It works fine in cases where the problem size is small . When I graduate to a bigger problem size (CLASS D), I get this error (for MG benchmark)
*** Caught a fatal signal: SIGBUS(7) on node 2/32
p4_error: latest msg from perror: Bad file descriptor
*** Caught a signal: SIGPIPE(13) on node 0/32
p4_error: latest msg from perror: Bad file descriptor
p4_error: latest msg from perror: Bad file descriptor
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** Caught a signal: SIGPIPE(13) on node 27/32
*** Caught a signal: SIGPIPE(13) on node 20/32
*** Caught a signal: SIGPIPE(13) on node 21/32
p4_error: latest msg from perror: Bad file descriptor
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** Caught a signal: SIGPIPE(13) on node 16/32
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
Can anybody explain why this is happening , And if anyone has seen this error before and fixed it ?
EDIT : Figured out it is a memory related problem . But I'm unable to allott right amount of memory for application at compile time
I figured it is a problem with benchmark needing more memory than i had allotted it during compile time.