Search code examples
bashshellloopskshtail

KSH Shell script - Process file by blocks of lines


I am trying to write a bash script in a KSH environment that would iterate through a source text file and process it by blocks of lines

So far I have come up with this code, although it seems to go indefinitely since the tail command does not return 0 lines if asked to retrieve lines beyond those in the source text file

i=1
while [[ `wc -l /path/to/block.file | awk -F' ' '{print $1}'` -gt $((i * 1000)) ]]

do
  lc=$((i * 1000))
  DA=ProcessingResult_$i.csv
  head -$lc /path/to/source.file | tail -1000 > /path/to/block.file
  cd /path/to/processing/batch
  ./process.sh #This will process /path/to/block.file
  mv /output/directory/ProcessingResult.csv /output/directory/$DA
  i=$((i + 1))
done

Before launching the above script I perform a manual 'first injection': head -$lc /path/to/source.file | tail -1000 > /path/to/temp.source.file

Any idea on how to get the script to stop after processing the last lines from the source file?

Thanks in advance to you all


Solution

  • If you do not want to create so many temporary files up front before beginning to process each block, you could try the below solution. It can save lot of space when processing huge files.

    #!/usr/bin/ksh
    
    range=$1
    file=$2
    
    b=0; e=0; seq=1
    while true
    do
       b=$((e+1)); e=$((range*seq));
    
       sed -n ${b},${e}p $file > ${file}.temp
    
       [ $(wc -l ${file}.temp | cut -d " " -f 1) -eq 0 ] && break
    
       ## process the ${file}.temp as per your need ##
    
       ((seq++))
    done
    

    The above code generates only one temporary file at a time. You could pass the range(block size) and the filename as command line args to the script.

    example: extractblock.sh 1000 inputfile.txt