Search code examples
perlsed

search and replace multiple "foo" and "bar" in a file using Perl? SED is too slow


I have a really huge text file (more than 4GB). And I have multiple entries to be searched and replaced in this huge file (pattern.txt).

So, I made up a file called leo.sed and used sed -f command.

leo.sed: This file contains around 500 entries. Example:

s/"PET10"/"PETfdfd0"/g
s/"PET11"/"PET123wef"/g
s/"PET12"/"TETPrandom"/g


I am using following sed command but it is extremely slow.

sed -f leo.sed pattern.text | sed -f leo1.sed > pattern_after_leo_leo1_sed.txt

Any faster way to do with perl one-liner?


Solution

  • If it only needs to be done once, if it's "fast enough", set it running and do something else. Your time is more valuable than the computer's.


    If you're limited by how fast your disk is, there's not much you can do.

    If not, the same technique, doing 500 patterns on each line, is unlikely to be any faster in Perl. Instead, you need to improve the algorithm. The number of regexes needs to be reduced. This can be done by finding some common pattern.

    For example, if it's everything in quotes we can use one regex that matches anything in quotes. Then the replacement value comes from a hash. We set up the hash in a BEGIN block so that is only done once before the file is scanned. We can use the babycart operator to interpolate an expression in a string.

    perl -i.orig -pe 'BEGIN { %replacements = (PET10 => "PETfdfd0", PET11 => "PET123wef"); } s{"([^"]+)"}{"@{[$replacements{$1} || $1]}"}g' test.txt
    

    Now each line needs only to be scanned once. This may or may not be faster.