Search code examples
perlfindandmodify

Perl replace strings within specific blocks in a file


hi I am trying to replace strings in a file, test.txt with strings like these :

  <g
   id="g16526">
  <g

  <g
   id="gnnnnn">
  <g

and turn them into

  <g
   id="gg1">
  <g
   ...
  <g
   id="ggn">
  <g

using this perl script

    #!C:/Strawberry/perl
    open(FILE, "<test.txt") || die "File not found";
    my @lines = <FILE>;
    close(FILE);
    my $string = '<g
    id=';
    my $string2 = '<g
    <g'; 
    my $anything = ".*";

    my $replace = 'gg';
    my @newlines;
    my $counter = 1;

    foreach(@lines) {
      $_ =~ s/\Qstring$anything\Q$string2/$string$replace$string2$counter/g;
      $counter++;
      push(@newlines,$_);
    }

    open(FILE, ">test.txt") || die "File not found";
    print FILE @newlines;
    close(FILE);

but it doesnt work, any suggestions appreciated


Solution

  • If this indeed has an XML-like structure as it appears, it should be processed using modules for that, either XML::LibXML or XML::Twig.

    But this task as shown is easily done in an elementary way as well

    perl -0777 -wpE'
        BEGIN { $cnt = 0 };
        s/<g\nid="g\K(.*?)"/q(g).(++$cnt).q(")/eg;
    ' input.txt
    

    which expects the file-format to be exactly as shown. It reads the whole file into a string, by -0777, what isn't prettiest and may be unsuitable for very large files.

    Another way is to set the record separator to <g, so every "line" is the block to process

    perl -wpE'
        BEGIN { local $/ = "<g"; $cnt = 0 }; 
        s/id="g\K(.*?)"/q(g).++$cnt.q(")/eg; 
    ' input.txt
    

    where now the regex is free to seek precisely id="..." and we can process line-by-line.

    These both print the expected output. They are in one-liners for easier testing, I suggest transferring to a script.