Search code examples

Trick to read data from hard drive faster between sucessive compilations

I am developing code with a compiled language (Fortran 95) that does certain calculations on a huge galaxy catalog. Each time I implement some change, I compile and run the code, and it takes about 3 minutes just reading the ASCII file with the galaxy data from disk. This is a waste of time.

Had I started this project in IDL or Matlab, then it would be different, because the variables containing the array data would be kept in memory between different compilations.

However, I think something could be done to speed up that unnerving reading from disk, like having the files in a fake RAM partition or something.


  • Instead of going into details on RAM disks I propose you switch from ASCII databases to Binary ones. here is a very simplistic example... An array of random numbers, stored as ASCII (ASCII.txt) and as binary date (binary.bin):

    program writeArr
      use,intrinsic :: ISO_Fortran_env, only: REAL64
      implicit none
      real(REAL64),allocatable :: tmp(:,:)
      integer :: uFile, i
      allocate( tmp(10000,10000) )
      ! Formatted read  
      open(unit=uFile, file='ASCII.txt',form='formatted', &
      do i=1,size(tmp,1)
        write(uFile,*) tmp(:,i)
      enddo !i
      ! Unformatted read  
      open(unit=uFile, file='binary.bin',form='unformatted', &
      write(uFile) tmp
    end program

    Here is the result in terms of sizes:

     :> ls -lah ASCII.txt binary.bin 
    -rw-rw-r--. 1 elias elias 2.5G Feb 20 20:59 ASCII.txt
    -rw-rw-r--. 1 elias elias 763M Feb 20 20:59 binary.bin

    So, you save a factor of ~3.35 in terms of storage. Now comes the fun part: reading it back in...

    program readArr
      use,intrinsic :: ISO_Fortran_env, only: REAL64
      implicit none
      real(REAL64),allocatable :: tmp(:,:)
      integer :: uFile, i
      integer :: count_rate, iTime1, iTime2
      allocate( tmp(10000,10000) )
      ! Get the count rate
      call system_clock(count_rate=count_rate)
      ! Formatted write  
      open(unit=uFile, file='ASCII.txt',form='formatted', &
      call system_clock(iTime1)
      do i=1,size(tmp,1)
        read(uFile,*) tmp(:,i)
      enddo !i
      call system_clock(iTime2)
      print *,'ASCII  read ',real(iTime2-iTime1,REAL64)/real(count_rate,REAL64)
      ! Unformatted write  
      open(unit=uFile, file='binary.bin',form='unformatted', &
      call system_clock(iTime1)
      read(uFile) tmp
      call system_clock(iTime2)
      print *,'Binary read ',real(iTime2-iTime1,REAL64)/real(count_rate,REAL64)
    end program

    The result is

     ASCII  read    37.250999999999998     
     Binary read    1.5460000000000000   

    So, a factor of >24!

    So instead of thinking of anything else, please switch to a binary file format first.