I need to process many files (with CRLF
line endings) that look like that:
$ cat -v file1.txt
1$XXX$ZZZ$$$$$$$$^M
2$AAA$BBB$$$$$$$$^M
$ cat -v file2.txt
1$4668$$$^M
2$46$$$^M
I need to:
$
sign,$
to ,
,Desired output (no matter if line endings are CRLF
or LF
):
$ cat newname1.csv
"1","XXX","ZZZ","","","","","","",""
"2","AAA","BBB","","","","","","",""
$ cat newname2.csv
"1","4668","",""
"2","46","",""
Here is my attempt:
#!/usr/bin/perl
use strict;
use warnings;
my %inputs = qw(
file1 file1.txt
file2 file2.txt
);
my %outputs = qw(
file1 newname1.csv
file2 newname2.csv
);
for my $key (keys %inputs) {
open my $in, '<', $inputs{$key} or die $!;
open my $out, '>', $outputs{$key} or die $!;
while(<$in>) {
local $, = ',';
local $\ = "\n";
s/\$$//;
my @row = split /\$/;
print $out map qq("$_"), @row;
}
close $in or die $!;
close $out or die $!;
}
On Linux, it gives files with a CRLF
enclosed in the last column and LF
line endings:
$ cat -v newname1.csv
"1","XXX","ZZZ","","","","","","","","^M
"
"2","AAA","BBB","","","","","","","","^M
"
$ cat -v newname2.csv
"1","4668","","","^M
"
"2","46","","","^M
"
I guess the issue is due to CRLF
line endings. Therefore, I tried:
'<'
to '<:crlf'
to open my files, which gives the same result;$
sign (e.g. \$\r\n
and \$\R
, which both result in files without the empty trailing columns).How can I fix my code to get my desired output?
Update: This answer was written for the first two versions of the question. I only undeleted it because the OP asked me to. It may not fit the current version of the question. Some things might be outright wrong.
This has nothing to do with line endings being CRLF. It is just a split
issue.
If I add a Dumper print to your code, where you have split into the variable @row
my @row = split /\$/;
use Data::Dumper;
print Dumper \@row;
I get (for the first field):
$VAR1 = [
'1',
'4668',
'',
'',
'
'
];
Where you can see the trailing newline is the last field in your split.
When you then treat these split results as genuine column values in your data, you get 1 field added for the newline.
I do not see where you are removing the last $
. Maybe that is something you misunderstood?
Suggested solution:
If this is csv data, you should use a csv module to handle it. The Text::CSV
module does this well. Here's a sample code that will handle your inputs:
use strict;
use warnings;
use Text::CSV qw(csv);
my %inputs = qw(
file1 file1.txt
file2 file2.txt
);
my %outputs = qw(
file1 newname1.csv
file2 newname2.csv
);
for my $key (keys %inputs) {
my $aoa = csv (in => $inputs{$key}, sep_char => '$');
csv (in => $aoa, out => $outputs{$key}, sep_char => ',', always_quote => 1);
}
Update:
Since you edited your question and added a line of code that changes everything and makes your own claimed output "wrong", I've found the following:
If you have only trailing empty fields, split
will delete those empty fields by default. This can be fixed, as specified in the documentation for split:
If LIMIT is negative, it is treated as if it were instead arbitrarily large; as many fields as possible are produced.
If LIMIT is omitted (or, equivalently, zero), then it is usually treated as if it were instead negative but with the exception that trailing empty fields are stripped (empty leading fields are always preserved); if all fields are empty, then all fields are considered to be trailing (and are thus stripped in this case).
In other words, you can change
split /\$/;
to
split /\$/, $_, -1;
to fix your missing trailing empty fields.
The only problem is that you have not reported having this problem (yet). So, I guess we need to wait for you to update your question.