I have files that start with unix-delimited text lines, then switch to binary. The text portion ends with a specific string followed by newline. After that it is binary.
I need to write the text portion into one file, then write the remainder of the binary data into another file. Here's what I have so far, but I'm stuck on how to switch to binary and write the remainder.
#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
my ($inputfilename, $outtextfilename, $outbinfilename) = @ARGV;
open(my $in, '<:encoding(UTF-8)', $inputfilename)
or die "Could not open file '$inputfilename' $!";
open my $outtext, '>', $outtextfilename or die;
my $outbin;
open $outbin, '>', $outbinfilename or die;
binmode $outbin;
while (my $aline = <$in>) {
chomp $aline;
if($aline =~ /\<\/FileSystem\>/) { # a match indicates the end of the text portion - the rest is binary
print $outtext "$aline\n"; # last line of the text portion
print "$aline\n"; # last line of the text portion
close ($outtext);
binmode $in; # change input file to binary?
# what do I do here to copy all remaining bytes in file as binary to $outbin??
die;
} else {
print $outtext "$aline\n"; # a line of the text portion
print "$aline\n"; # a line of the text portion
}
}
close ($in);
close ($outbin);
Edit - final code:
#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
my ($inputfilename, $outtextfilename, $outbinfilename) = @ARGV;
open(my $in, '<', $inputfilename)
or die "Could not open file '$inputfilename' $!";
open my $outtext, '>', $outtextfilename or die;
my $outbin;
open $outbin, '>', $outbinfilename or die;
binmode $outbin;
print "Starting File\n";
while (my $aline = <$in>) {
chomp $aline;
if($aline =~ /\<\/FileSystem\>/) { # a match indicates the end of the text portion - the rest is binary
print $outtext "$aline\n"; # last line of the text portion
print "$aline\n"; # last line of the text portion
close ($outtext);
binmode $in; # change input file to binary
my $cont = '';
print "processing binary portion";
while (1) {
my $success = read $in, $cont, 1000000, length($cont);
die $! if not defined $success;
last if not $success;
print ".";
}
close ($in);
print $outbin $cont;
print "\nDone\n";
close $outbin;
last;
} else {
print $outtext "$aline\n"; # a line of the text portion
print "$aline\n"; # a line of the text portion
}
}
The easiest way is probably to use binary I/O for everything. That way we don't have to worry about switching file modes halfway through, and on unix there is no difference between text and binary mode anyway (except when it comes to encodings, but here we just want to copy bytes unchanged).
Depending on how big the plain text portion of the file is, we could either process it line by line or read it all into memory at once.
#!/usr/bin/perl
use strict;
use warnings;
my ($inputfilename, $outtextfilename, $outbinfilename) = @ARGV;
open my $in_fh, '<:raw', $inputfilename
or die "$0: can't open $inputfilename for reading: $!\n";
open my $out_txt_fh, '>:raw', $outtextfilename
or die "$0: can't open $outtextfilename for writing: $!\n";
open my $out_bin_fh, '>:raw', $outbinfilename
or die "$0: can't open $outbinfilename for writing: $!\n";
# process text part
while (my $line = readline $in_fh) {
print $out_txt_fh $line;
last if $line =~ m{</FileSystem>};
}
# process binary part
while (read $in_fh, my $buffer, 4096) {
print $out_bin_fh $buffer;
}
This version of the code processes the text part line by line and the binary part in chunks of 4096 bytes (not taking internal buffering into account).
Alternatively, if the character sequence marking the end of the text part is exactly "</FileSystem>\n"
, we can be a bit cheeky:
# process text part
{
local $/ = "</FileSystem>\n";
if (my $line = readline $in_fh) {
print $out_txt_fh $line;
}
}
We temporarily switch the end-of-line marker from "\n"
to "</FileSystem>\n"
and read a single "line", which encompasses all of the text part. This assumes the text part is small enough to comfortably fit into memory. The rest of the script is the same.