I have a very large array size of 20,000,000 that I would like to write to a file, unformatted. It is an autocorrelation function.
It is pretty quick using the -O4 optimization compilation flag without the writing to file. But as soon as i write to file it seems like it would take over a day to finish.
At the end is the f90 program. Below is the outputs without writing to file and with writing to file.
It's clear that writing single element of an array takes around 10ms.
20,000,000 x 0.01 = 200,000 seconds = 3,333 minutes = 55 hrs
How is it possible that it takes this long to write to a file when reading only takes 45 seconds? And what can I do to improve the speed?
Notes
System: Ubuntu 20.04
Compilation line: fortran -o acorr.exe -O4 acorr.f90
No File Write
elapsed time for reading: 43.4389992
Size of Jx: 20000000
Loop Start Time: 43.5009995
correlation time magnitude 1e0 elapsed time: 43.5009995
correlation time magnitude 1e1 elapsed time: 43.5009995
correlation time magnitude 1e2 elapsed time: 43.5009995
correlation time magnitude 1e3 elapsed time: 43.5009995
correlation time magnitude 1e4 elapsed time: 43.5009995
correlation time magnitude 1e5 elapsed time: 43.5009995
correlation time magnitude 1e6 elapsed time: 43.5029984
correlation time magnitude 1e7 elapsed time: 43.5190010
elapsed time: 43.5369987
With File Write
elapsed time for reading: 43.6349983
Size of Jx: 20000000
Loop Start Time: 43.6949997
correlation time magnitude 1e0 elapsed time: 43.7319984
correlation time magnitude 1e1 elapsed time: 43.8969994
correlation time magnitude 1e2 elapsed time: 45.4980011
correlation time magnitude 1e3 elapsed time: 61.5289993
acorr.f90
PROGRAM acorr
real:: a,b,c,d, sum, mean, var
integer:: i,j, jsize,beginning, rate, end, end1
real, dimension(20000000):: Jx, Jxm, corr
integer:: skip_lines = 4
call system_clock(beginning, rate)
!reading file
open(10, file='DiamHeat.log', status='old')
do i = 1,skip_lines
read(10,*)
end do
do i = 1, 20000000
read(10,*) a, b, Jx(i), c, d
end do
call system_clock(end)
print *, "elapsed time for reading: ", real(end - beginning) / real(rate)
close(10)
!finished reading
open(20, file='acorr.txt', form='UNFORMATTED')
jsize = size(Jx)
print *, "Size of Jx: ", jsize
!print *, dot_product(Jx(10:jsize),Jx(1:jsize-10))
!calculate mean
mean = sum(Jx)/jsize
Jxm(:) = Jx(:)-mean
!calculate variance
var = dot_product(Jxm,Jxm)/jsize
!begin autocorrelation calc
call system_clock(end1)
print *, "Loop Start Time: ", real(end1 - beginning) / real(rate)
do i =0,jsize-1
!calculation
corr(i+1) = dot_product(Jxm(i+1:jsize),Jxm(1:jsize-i))/var/(jsize-i)
!clock timing
if(i == 1) then
call system_clock(end)
print *, "correlation time magnitude 1e0 elapsed time: ", real(end - beginning) / real(rate)
else if(i == 10) then
call system_clock(end)
print *, "correlation time magnitude 1e1 elapsed time: ", real(end - beginning) / real(rate)
else if(i == 100) then
call system_clock(end)
print *, "correlation time magnitude 1e2 elapsed time: ", real(end - beginning) / real(rate)
else if(i == 1000) then
call system_clock(end)
print *, "correlation time magnitude 1e3 elapsed time: ", real(end - beginning) / real(rate)
else if(i == 10000) then
call system_clock(end)
print *, "correlation time magnitude 1e4 elapsed time: ", real(end - beginning) / real(rate)
else if(i == 100000) then
call system_clock(end)
print *, "correlation time magnitude 1e5 elapsed time: ", real(end - beginning) / real(rate)
else if(i == 1000000) then
call system_clock(end)
print *, "correlation time magnitude 1e6 elapsed time: ", real(end - beginning) / real(rate)
else if(i == 10000000) then
call system_clock(end)
print *, "correlation time magnitude 1e7 elapsed time: ", real(end - beginning) / real(rate)
end if
end do
write(20,*) corr
close(20)
call system_clock(end)
print *, "elapsed time: ", real(end - beginning) / real(rate)
END PROGRAM
As @francescalus commented, the compiler seems to be skipping over doing the calculation unless it is used for another purpose. Adding the
print*, sum corr
after the loop seems to make the program compute the dot product in the loop. This just takes a long time but it is computing at best capacity.
Thanks again @francescalus