why is valgrind ignoring my --error-exitcode option?

I understood that the --error-exitcode option can be used to cause valgrind to return a non-zero exit code when leaks are found. The valgrind documentation says:

--error-exitcode= [default: 0] Specifies an alternative exit code to return if Valgrind reported any errors in the run. When set to the default value (zero), the return value from Valgrind will always be the return value of the process being simulated. When set to a nonzero value, that value is returned instead, if Valgrind detects any errors. This is useful for using Valgrind as part of an automated test suite, since it makes it easy to detect test cases for which Valgrind has reported errors, just by inspecting return codes. When set to a nonzero value and Valgrind detects no error, the return value of Valgrind will be the return value of the program being simulated.

But when using valgrind on my one of my test programs with the option, the output clearly shows a leak, yet valgrind returns 0. What am I doing wrong?

ed@mikado:~/NCEPLIBS-g2/b/tests$ valgrind --error-exitcode=1 --leak-check=full --show-leak-kinds=all ./test_getgb2_mem_4 && echo $?
==72557== Memcheck, a memory error detector
==72557== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==72557== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==72557== Command: ./test_getgb2_mem_4
==72557== 
 Opening GRIB2 file data/gep19.t00z.pgrb2a.0p50_bcf144                                                                                      
  getgb2 returned           99
  getgb2 returned           99
  getgb2 returned           99
  getgb2 returned           99
  getgb2 returned           99
 Opening GRIB2 file data/geavg.t00z.pgrb2a.0p50_mecomf144                                                                                   
  getgb2 returned           99
  getgb2 returned           99
  getgb2 returned           99
  getgb2 returned           99
  getgb2 returned           99
 Opening GRIB2 file data/gec00.t00z.pgrb2a.0p50.f144                                                                                        
 REC  PD5 PD6 PD7 YEAR MN  DY  HR  F1  F2  FU  E2  E3  E4   LEN      MAX        MIN       Sample 
  56   7 100 1000  23   4  30   0 144   0   0   1   2   1  259920     364.85    -458.59     165.82
  47   7 100  925  23   4  30   0 144   0   0   1   2   1  259920     956.03     172.94     766.25
  41   7 100  850  23   4  30   0 144   0   0   1   2   1  259920    1644.57     855.80    1413.74
  36   7 100  700  23   4  30   0 144   0   0   1   2   1  259920    3223.11    2388.85    2870.44
  31   7 100  500  23   4  30   0 144   0   0   1   2   1  259920    5910.78    4752.64    5280.56
 Opening GRIB2 file data/gegfs.t00z.pgrb2a.0p50.f144                                                                                        
 REC  PD5 PD6 PD7 YEAR MN  DY  HR  F1  F2  FU  E2  E3  E4   LEN      MAX        MIN       Sample 
  57   7 100 1000  23   4  30   0 144   0  10   1   1   1  259920     518.26    -490.32     161.32
  48   7 100  925  23   4  30   0 144   0  10   1   1   1  259920    1021.39     147.21     762.75
  42   7 100  850  23   4  30   0 144   0  10   1   1   1  259920    1692.23     830.58    1411.62
  37   7 100  700  23   4  30   0 144   0  10   1   1   1  259920    3237.82    2361.94    2871.40
  32   7 100  500  23   4  30   0 144   0  10   1   1   1  259920    5921.03    4759.57    5286.23
 Opening GRIB2 file data/gegfs.t00z.pgrb2a.0p50_mef144                                                                                      
  getgb2 returned           99
  getgb2 returned           99
  getgb2 returned           99
  getgb2 returned           99
  getgb2 returned           99
 Opening GRIB2 file data/gep19.t00z.pgrb2a.0p50.f144                                                                                        
 REC  PD5 PD6 PD7 YEAR MN  DY  HR  F1  F2  FU  E2  E3  E4   LEN      MAX        MIN       Sample 
  56   7 100 1000  23   4  30   0 144   0   0   3  19   1  259920     306.98    -454.66      93.58
  47   7 100  925  23   4  30   0 144   0   0   3  19   1  259920     952.33     163.82     695.26
  41   7 100  850  23   4  30   0 144   0   0   3  19   1  259920    1640.95     825.74    1343.70
  36   7 100  700  23   4  30   0 144   0   0   3  19   1  259920    3214.48    2306.39    2803.36
  31   7 100  500  23   4  30   0 144   0   0   3  19   1  259920    5913.73    4742.57    5224.79
==72557== 
==72557== HEAP SUMMARY:
==72557==     in use at exit: 300,000 bytes in 6 blocks
==72557==   total heap usage: 4,268 allocs, 4,262 frees, 59,833,667 bytes allocated
==72557== 
==72557== 300,000 bytes in 6 blocks are still reachable in loss record 1 of 1
==72557==    at 0x4E050B5: malloc (vg_replace_malloc.c:431)
==72557==    by 0x12E71A: getg2ir_ (getg2ir.f:55)
==72557==    by 0x118150: getidx_ (getidx.F90:117)
==72557==    by 0x115B4A: getgb2_ (getgb2.f:107)
==72557==    by 0x10A950: gb2read_ (test_getgb2_mem.F90:165)
==72557==    by 0x10DA04: MAIN__ (test_getgb2_mem.F90:43)
==72557==    by 0x10DA86: main (test_getgb2_mem.F90:50)
==72557== 
==72557== LEAK SUMMARY:
==72557==    definitely lost: 0 bytes in 0 blocks
==72557==    indirectly lost: 0 bytes in 0 blocks
==72557==      possibly lost: 0 bytes in 0 blocks
==72557==    still reachable: 300,000 bytes in 6 blocks
==72557==         suppressed: 0 bytes in 0 blocks
==72557== 
==72557== For lists of detected and suppressed errors, rerun with: -s
==72557== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
0
ed@mikado:~/NCEPLIBS-g2/b/tests$

Is it because valgrind does not regard this as an error because it is still reachable?

If so, how do I detect such leaks with valgrind and get a non-zero exit code, so that I can use my CI to run the tests?

Solution

Yes, by default, memory that is still reachable when the process terminates is not an error ("ERROR SUMMARY: 0 errors"). As the FAQ indicates, if you have memory that is still reachable...

your program is probably ok -- it didn't free some memory it could have. This is quite common and often reasonable.

Since at the end of the process, the operating system will reclaim all memory anyway, there's not really a big practical advantage to freeing memory yourself just before the program ends, though one could argue it is neater.

If you want Valgrind to treat reachable memory as an error, you can pass --errors-for-leak-kinds=all in addition to your current flags:

$ valgrind --leak-check=full --show-leak-kinds=all --errors-for-leak-kinds=all --error-exitcode=1 ./example
==103420== Memcheck, a memory error detector
==103420== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==103420== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==103420== Command: ./example
==103420== 
==103420== 
==103420== HEAP SUMMARY:
==103420==     in use at exit: 123 bytes in 1 blocks
==103420==   total heap usage: 1 allocs, 0 frees, 123 bytes allocated
==103420== 
==103420== 123 bytes in 1 blocks are still reachable in loss record 1 of 1
==103420==    at 0x484182F: malloc (vg_replace_malloc.c:431)
==103420==    by 0x401133: main (in /home/foo/example)
==103420== 
==103420== LEAK SUMMARY:
==103420==    definitely lost: 0 bytes in 0 blocks
==103420==    indirectly lost: 0 bytes in 0 blocks
==103420==      possibly lost: 0 bytes in 0 blocks
==103420==    still reachable: 123 bytes in 1 blocks
==103420==         suppressed: 0 bytes in 0 blocks
==103420== 
==103420== For lists of detected and suppressed errors, rerun with: -s
==103420== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
$ echo $?
1