Search code examples
zshifs

Splitting a file in a shell script adds unwanted newlines


I need to process a long text file splitting it into many smaller files. I have a single pass while - read - done <inputfile loop and when a line is matched, that signals the start of new output file. The matched lines are always preceded by a newline character in the input file.

My problem is that the output files (except the final one) are extended by a newline character. I have recreated the problem in this short example.

#!/bin/zsh

rm inputfile outputfile1 outputfile2
IFS=''
printf "section1\nsection1end\n\nsection2\nsection2end\n" >inputfile

echo "  open outputfile1"
exec 3<> outputfile1
counter=1
IFS=$'\n'

while IFS= read line; do

    if [[ "$line" == "section2" ]]; then
        echo "  Matched start of section2. Close outputfile1 and open outputfile2"
        exec 3>&-
        exec 3<> outputfile2
    fi
    echo "$line" >&3
    echo $counter $line
    let "counter = $counter + 1"
done <inputfile
echo "  Close outputfile2"
exec 3>&-

echo
unset IFS
echo `wc -l inputfile`
echo `wc -l outputfile1`
echo `wc -l outputfile2`
echo "  The above should show 5, 2, 2 as desired number of newlines in these files."

Which outputs:

  open outputfile1
1 section1
2 section1end
3
  Matched start of section2. Close outputfile1 and open outputfile2
4 section2
5 section2end
  Close outputfile2

5 inputfile
3 outputfile1
2 outputfile2
  The above should show 5, 2, 2 as desired number of newlines in these files.

Solution

  • Option 1

    Get rid of all empty lines. This only works if you don't need to retain any of the empty lines in the middle of a section. Change:

        echo "$line" >&3
    

    To:

        [[ -n "$line" ]] && echo "$line" >&3
    

    Option 2

    Rewrite each file using command substitution to trim any trailing newlines. Works best with short files. Change:

            exec 3>&-
            exec 3<> outputfile2
    

    To:

            exec 3>&-
            data=$(<outputfile1)
            echo "$data" >outputfile1
            exec 3<> outputfile2
    

    Option 3

    Have the loop write the line from the prior iteration, and then do not write the final line from the prior file when you start a new file:

    #!/bin/zsh
    
    rm inputfile outputfile1 outputfile2
    IFS=''
    printf "section1\nsection1end\n\nsection2\nsection2end\n" >inputfile
    
    echo "  open outputfile1"
    exec 3<> outputfile1
    counter=1
    IFS=$'\n'
    
    priorLine=MARKER
    while IFS= read line; do
        if [[ "$line" == "section2" ]]; then
            echo "  Matched start of section2. Close outputfile1 and open outputfile2"
            exec 3>&-
            exec 3<> outputfile2
        elif [[ "$priorLine" != MARKER ]]; then
            echo "$priorLine" >&3
        fi
        echo $counter $line
        let "counter = $counter + 1"
        priorLine="$line"
    done <inputfile
    echo "$priorLine" >&3
    echo "  Close outputfile2"
    exec 3>&-
    
    echo
    unset IFS
    echo `wc -l inputfile`
    echo `wc -l outputfile1`
    echo `wc -l outputfile2`
    echo "  The above should show 5, 2, 2 as desired number of newlines in these files."