I use Fortran to do some scientific computation. I use HPC. As we know, when we submit jobs in a HPC job scheduler, we also specify the wall clock time limit for our jobs. However, when the time is up, if the job is still writing output data, it will be terminated and it will cause some 'NUL' values in the data, causing trouble for the post-processing:
So, could we set an internal mechanism that our job can stop itself peacefully some time before the end of HPC allowance time?
Related Question: How to skip reading "NUL" value in MATLAB's textscan function?
After realizing what you are asking I found out that I implemented similar functionality in my program very recently (commit https://bitbucket.org/LadaF/elmm/commits/f10a1b3421a3dd14fdcbe165aa70bf5c5001413f). But I still have to set the time limit manually.
The most important part:
time_stepping%clock_time_limit
is the time limit in seconds. Count the number of system clock ticks corresponding to that:
call system_clock(count_rate = timer_rate)
call system_clock(count_max = timer_max_count)
timer_count_time_limit = int( min(time_stepping%clock_time_limit &
* real(timer_rate, knd), &
real(timer_max_count, knd) * 0.999_dbl) &
, dbl)
Start the timer
call system_clock(count = time_steps_timer_count_start)
Check the timer and exit the main loop with error_exit
set to .true.
if the time is up
if (mod(time_step,time_stepping%check_period)==0) then
if (master) then
error_exit = time_steps_timer_count_2 - time_steps_timer_count_start > timer_count_time_limit
if (error_exit) write(*,*) "Maximum clock time exceeded."
end if
MPI_Bcast the error exit to other processes
if (error_exit) exit
end if
Now, you may want to get the time limit from your scheduler automatically. That will vary between different job scheduling softwares. There will be an environment variable like $PBS_WALLTIME
. See Get walltime in a PBS job script but check your scheduler's manual.
You can read this variable using GET_ENVIRONMENT_VARIABLE()