Perl find and replace multiple(huge) strings in one shot

Based on a mapping file, i need to search for a string and if found append the replace string to the end of line. I'm traversing through the mapping file line by line and using the below perl one-liner, appending the strings.

Issues:

1.Huge find & replace Entries: But the issues is the mapping file has huge number of entries (~7000 entries) and perl one-liners takes ~1 seconds for each entries which boils down to ~1 Hour to complete the entire replacement.

2.Not Simple Find and Replace: Its not a simple Find & Replace. It is - if found string, append the replace string to EOL. If there is no efficient way to process this, i would even consider replacing rather than appending.

Mine is on Windows 7 64-Bit environment and im using active perl. No *unix support.

File Samples

Map.csv

findStr1,RplStr1

findStr2,RplStr2

findStr3,RplStr3

.....

findStr7000,RplStr7000

input.csv

col1,col2,col3,findStr1,....col-N

col1,col2,col3,findStr2,....col-N

col1,col2,col3,FIND-STR-NOT-EXIST,....col-N

output.csv (Expected Output)

col1,col2,col3,findStr1,....col-N,**RplStr1**

col1,col2,col3,findStr1,....col-N,**RplStr2**

col1,col2,col3,FIND-STR-NOT-EXIST,....col-N

Perl Code Snippet

One-Liner

perl -pe '/findStr/ && s/$/RplStr/' file.csv


open( INFILE, $MarketMapFile ) or die "Error occured: $!";
    my @data = <INFILE>;


    my $cnt=1;  
    foreach $line (@data) {
        eval {          
            # Remove end of line character.
            $line =~ s/\n//g;
            my ( $eNodeBID, $MarketName ) = split( ',', $line );
            my $exeCmd = 'perl -i.bak -p -e "/'.$eNodeBID.'\(M\)/ && s/$/,'.$MarketName.'/;" '.$CSVFile;
            print "\n $cnt Repelacing $eNodeBID with $MarketName and cmd is $exeCmd";
            system($exeCmd);
            $cnt++;
        }
    }       
    close(INFILE);

Solution

To do this in a single pass through your input CSV, it's easiest to store your mapping in a hash. 7000 entries is not particularly huge, but if you're worried about storing all of that in memory you can use Tie::File::AsHash.

#!/usr/bin/perl

use strict;
use warnings;

use Text::CSV;
use Tie::File::AsHash;

tie my %replace, 'Tie::File::AsHash', 'map.csv', split => ',' or die $!;

my $csv = Text::CSV->new({ binary => 1, auto_diag => 1, eol => $/ })
        or die Text::CSV->error_diag;

open my $in_fh, '<', 'input.csv' or die $!;
open my $out_fh, '>', 'output.csv' or die $!;

while (my $row = $csv->getline($in_fh)) {
    push @$row, $replace{$row->[3]};
    $csv->print($out_fh, $row);
}

untie %replace;
close $in_fh;
close $out_fh;

map.csv

foo,bar
apple,orange
pony,unicorn

input.csv

field1,field2,field3,pony,field5,field6
field1,field2,field3,banana,field5,field6
field1,field2,field3,apple,field5,field6

output.csv

field1,field2,field3,pony,field5,field6,unicorn
field1,field2,field3,banana,field5,field6,
field1,field2,field3,apple,field5,field6,orange

I don't recommend screwing up your CSV format by only appending fields to matching lines, so I add an empty field if a match isn't found.

To use a regular hash instead of Tie::File::AsHash, simply replace the tie statement with

open my $map_fh, '<', 'map.csv' or die $!;

my %replace = map { chomp; split /,/ } <$map_fh>;

close $map_fh;