Search code examples
perlreplacebulk

Perl find and replace multiple(huge) strings in one shot


Based on a mapping file, i need to search for a string and if found append the replace string to the end of line. I'm traversing through the mapping file line by line and using the below perl one-liner, appending the strings.

Issues:

1.Huge find & replace Entries: But the issues is the mapping file has huge number of entries (~7000 entries) and perl one-liners takes ~1 seconds for each entries which boils down to ~1 Hour to complete the entire replacement.

2.Not Simple Find and Replace: Its not a simple Find & Replace. It is - if found string, append the replace string to EOL. If there is no efficient way to process this, i would even consider replacing rather than appending.

Mine is on Windows 7 64-Bit environment and im using active perl. No *unix support.

File Samples

Map.csv

findStr1,RplStr1

findStr2,RplStr2

findStr3,RplStr3

.....

findStr7000,RplStr7000

input.csv

col1,col2,col3,findStr1,....col-N

col1,col2,col3,findStr2,....col-N

col1,col2,col3,FIND-STR-NOT-EXIST,....col-N

output.csv (Expected Output)

col1,col2,col3,findStr1,....col-N,**RplStr1**

col1,col2,col3,findStr1,....col-N,**RplStr2**

col1,col2,col3,FIND-STR-NOT-EXIST,....col-N

Perl Code Snippet

One-Liner

perl -pe '/findStr/ && s/$/RplStr/' file.csv


open( INFILE, $MarketMapFile ) or die "Error occured: $!";
    my @data = <INFILE>;


    my $cnt=1;  
    foreach $line (@data) {
        eval {          
            # Remove end of line character.
            $line =~ s/\n//g;
            my ( $eNodeBID, $MarketName ) = split( ',', $line );
            my $exeCmd = 'perl -i.bak -p -e "/'.$eNodeBID.'\(M\)/ && s/$/,'.$MarketName.'/;" '.$CSVFile;
            print "\n $cnt Repelacing $eNodeBID with $MarketName and cmd is $exeCmd";
            system($exeCmd);
            $cnt++;
        }
    }       
    close(INFILE);

Solution

  • To do this in a single pass through your input CSV, it's easiest to store your mapping in a hash. 7000 entries is not particularly huge, but if you're worried about storing all of that in memory you can use Tie::File::AsHash.

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    use Text::CSV;
    use Tie::File::AsHash;
    
    tie my %replace, 'Tie::File::AsHash', 'map.csv', split => ',' or die $!;
    
    my $csv = Text::CSV->new({ binary => 1, auto_diag => 1, eol => $/ })
            or die Text::CSV->error_diag;
    
    open my $in_fh, '<', 'input.csv' or die $!;
    open my $out_fh, '>', 'output.csv' or die $!;
    
    while (my $row = $csv->getline($in_fh)) {
        push @$row, $replace{$row->[3]};
        $csv->print($out_fh, $row);
    }
    
    untie %replace;
    close $in_fh;
    close $out_fh;
    

    map.csv

    foo,bar
    apple,orange
    pony,unicorn
    

    input.csv

    field1,field2,field3,pony,field5,field6
    field1,field2,field3,banana,field5,field6
    field1,field2,field3,apple,field5,field6
    

    output.csv

    field1,field2,field3,pony,field5,field6,unicorn
    field1,field2,field3,banana,field5,field6,
    field1,field2,field3,apple,field5,field6,orange
    

    I don't recommend screwing up your CSV format by only appending fields to matching lines, so I add an empty field if a match isn't found.

    To use a regular hash instead of Tie::File::AsHash, simply replace the tie statement with

    open my $map_fh, '<', 'map.csv' or die $!;
    
    my %replace = map { chomp; split /,/ } <$map_fh>;
    
    close $map_fh;