I need to process a long text file splitting it into many smaller files. I have a single pass while - read - done <inputfile loop and when a line is matched, that signals the start of new output file. The matched lines are always preceded by a newline character in the input file.
My problem is that the output files (except the final one) are extended by a newline character. I have recreated the problem in this short example.
#!/bin/zsh
rm inputfile outputfile1 outputfile2
IFS=''
printf "section1\nsection1end\n\nsection2\nsection2end\n" >inputfile
echo " open outputfile1"
exec 3<> outputfile1
counter=1
IFS=$'\n'
while IFS= read line; do
if [[ "$line" == "section2" ]]; then
echo " Matched start of section2. Close outputfile1 and open outputfile2"
exec 3>&-
exec 3<> outputfile2
fi
echo "$line" >&3
echo $counter $line
let "counter = $counter + 1"
done <inputfile
echo " Close outputfile2"
exec 3>&-
echo
unset IFS
echo `wc -l inputfile`
echo `wc -l outputfile1`
echo `wc -l outputfile2`
echo " The above should show 5, 2, 2 as desired number of newlines in these files."
Which outputs:
open outputfile1
1 section1
2 section1end
3
Matched start of section2. Close outputfile1 and open outputfile2
4 section2
5 section2end
Close outputfile2
5 inputfile
3 outputfile1
2 outputfile2
The above should show 5, 2, 2 as desired number of newlines in these files.
Get rid of all empty lines. This only works if you don't need to retain any of the empty lines in the middle of a section. Change:
echo "$line" >&3
To:
[[ -n "$line" ]] && echo "$line" >&3
Rewrite each file using command substitution to trim any trailing newlines. Works best with short files. Change:
exec 3>&-
exec 3<> outputfile2
To:
exec 3>&-
data=$(<outputfile1)
echo "$data" >outputfile1
exec 3<> outputfile2
Have the loop write the line from the prior iteration, and then do not write the final line from the prior file when you start a new file:
#!/bin/zsh
rm inputfile outputfile1 outputfile2
IFS=''
printf "section1\nsection1end\n\nsection2\nsection2end\n" >inputfile
echo " open outputfile1"
exec 3<> outputfile1
counter=1
IFS=$'\n'
priorLine=MARKER
while IFS= read line; do
if [[ "$line" == "section2" ]]; then
echo " Matched start of section2. Close outputfile1 and open outputfile2"
exec 3>&-
exec 3<> outputfile2
elif [[ "$priorLine" != MARKER ]]; then
echo "$priorLine" >&3
fi
echo $counter $line
let "counter = $counter + 1"
priorLine="$line"
done <inputfile
echo "$priorLine" >&3
echo " Close outputfile2"
exec 3>&-
echo
unset IFS
echo `wc -l inputfile`
echo `wc -l outputfile1`
echo `wc -l outputfile2`
echo " The above should show 5, 2, 2 as desired number of newlines in these files."