Search code examples
perlsedpdp-10

Pulling a sed script into a perl program


I need to shorten function names, so I identify them and produce a really long sed script that looks like this:

s/\breally_long_function_name1\b/A00128/g
s/\breally_long_function_name2\b/A00060/g
s/\breally_long_function_name3\b/A00035/g
s/\breally_long_function_name4\b/A00342/g
s/\breally_long_function_name5\b/A00203/g
...

and then call it like this:

`sed -i.bak -f $sedscript *`

The problem is that I can't depend on sed being able to handle those instances of \b and it seems messy. Instead of writing out that sed script, I want to put it into an array and then do something like this for each line of each file I need to process:

$targetline =~ $processing;

My problem is that using $processing like that won't work and the qoperators don't seem to do the job. How do I massage this to get the substitution in $processing processed and the result put in $targetline?

Note: I used Perl sed file inside script and the answer from @ron-bergin there to get this far.

The application for this is to convert modern-ish C code into a form that can be compiled by ancient C compilers. The existing script is at https://gitlab.com/DavidGriffith/frotz/-/blob/master/src/misc/snavig.pl where it's used to prepare source code for compilation by KCC, an early C compiler for PDP-10 mainframes. One of its quirks is that some symbols are limited to being 6 characters in length. Background on this is at https://github.com/PDP-10/panda/blob/master/files/kcc-6/kcc/user.doc#L519.

Here's a fragment from a source file before processing:

void reset_memory(void)
{
        if (story_fp != NULL)
                fclose(story_fp);
        story_fp = NULL;

        if (undo_diff) {
                free_undo(undo_count);
                zfree(undo_diff);
                zfree(prev_zmp);
        }

        undo_diff = NULL;
        undo_count = 0;
        prev_zmp = NULL;

        if (zmp)
                zfree(zmp);
        zmp = NULL;
} /* reset_memory */

After processing, it looks like this:

void A00156(void)
{
        if (A00144 != NULL)
                fclose(A00144);
        A00144 = NULL;

        if (undo_diff) {
                A00155(A00148);
                zfree(undo_diff);
                zfree(A00147);
        }

        undo_diff = NULL;
        A00148 = 0;
        A00147 = NULL;

        if (zmp)
                zfree(zmp);
        zmp = NULL;
} /* A00156 */

Once processed like this, it is proven to compile with KCC and run on TOPS20 on both emulated and real PDP-10 hardware.

With my attempts to not use an external sed, instead of doing anything, I get nothing and then a flood of this when pressing ^C:

Use of uninitialized value $targetline in pattern match (m//) at src/misc/snavig.pl line 181, <$targetfile> line 104.
Use of uninitialized value $targetline in pattern match (m//) at src/misc/snavig.pl line 181, <$targetfile> line 104.
Use of uninitialized value $targetline in pattern match (m//) at src/misc/snavig.pl line 181, <$targetfile> line 104.
Use of uninitialized value $targetline in pattern match (m//) at src/misc/snavig.pl line 181, <$targetfile> line 104.

Solution

  • One way to organize a large number of replacements

    use warnings;
    use strict;
    use feature 'say';
    
    my %repl = ( 
        really_long_function_name1 => 'A00128',
        really_long_function_name2 => 'A00060',
        # ...
    );
    
    my $re = join '|', keys %repl;  # add quotemeta if needed. see text
    
    while (<>) { 
        s/\b($re)\b/$repl{$1}/g;
        print
    }
    

    The <> operator reads line by line files with names given on the command line. Each line, changed or not, is merely printed so this acts as a filter. If the files need be edited in place then the code need be adjusted for that.

    If any of keys to use in the pattern can have symbols special for regex they should be escaped and a tool for that is quotemeta -- join '|', map { quotemeta } keys %repl. But here the keys are names of functions in a C program.

    This doesn't deal with some issues (what if some replacements are contained in others?), and may need other adjustments depending on details. I don't quite get all points, in particular why the replacement list is printed to files. If that is critical the replacement pairs above can be read from a file with a convenient format (a dump of a Perl data structure? JSON? Or YAML so that it is also nicely readable?)

    The list is easily edited/extended by adding a replacement pair to the hash.


    One way is shown in the SO page linked in the question: set the $^I global variable (value of the -i switch). With it being an empty string input files are changed "in-place" but we don't get a backup, otherwise its value is added as a suffix to the backup files

    local $^I = '';   # changes made to input files. no backup 
    
    while (<>) { 
        s/\b($re)\b/$repl{$1}/g;
        print;
    }
    

    or

    local $^I = '.bak';  # added suffix for the backup file(s)
    
    while (<>) { 
        s/.../.../g;
        print;
    }
    

    Make sure to have this code in a small enough scope so to be able to use local to limit this change -- so that the rest of the interpreter isn't affected!

    Or, handle the list of files manually if this doesn't feel good. Once a program is invoked

    progname [options]  file1 file2...
    

    then in the running program the array @ARGV contains all those words from the command-line (except from the program name).

    As @ARGV is processed by Getopt::Long the dash-ed options (with - or --) are removed and what remains in @ARGV are file1, file2 (etc) filenames. So after Getopt::Long processed options you can do

    foreach my $filename (@ARGV) { 
        # handle the file $filename
    }
    

    Or, copy filenames from @ARGV to safety in their own array and process from there. (Or, one can make filenames a part of options so they will be extracted by Getopt::Long.)

    If you handle files like this (instead of letting the "diamond" operator <> do it) then there are also libraries which can change a file in-place, like Path::Tiny::edit_lines

    use Path::Tiny;
    
    path($filename)->edit_lines( sub { s/\b($re)\b/$repl{$1}/g } );