Search code examples
perlfile-comparisoninsertion-order

How to insert only new and/or updated lines into another file


First days dealing with Perl and blocked already :)

Here's the situation: a file is updated in folder A but also exists in folders B, C & D and, to make it easier, it can be different in all of them so I can't just do a diff. New lines that are meant to be copied to the other files are identified by a flag, for instance #I, at the end of the line.

File before being updated looks like this:

    First line
    Second line
    Fifth line

After being updated it looks like this:

    First line
    Second line
    Third line #I
    Fourth line #I
    Fifth line
    Sixth line #I

What I need to do is to search for the "Second line" on the other files, insert lines tagged with #I - in the order they were inserted - then search for the "Fifth line" and insert the "Sixth line #I".

In this example they are all consecutive but in the files I need to update there can be several lines between the first update block and the second (and the third and etc etc).

The files that will be updated can be sh scripts, awk scripts, plain text files, etc., the script is supposed to be generic. The script will have two entry parameters, the updated file and the file to be updated.

Any hints on how to do this are are welcome. I can provide the code I have so far - close but not working yet - if needed.

Thanks,

João

PS: Here's what I have so far

# Pass the content of the file $FileUpdate to the updateFile array
@updateFile = <UPD>;

# Pass the content of the file $FileOriginal to the originalFile array
@originalFile = <ORG>;

# Remove empty lines from the array contained on the updated file
@updateFile = grep(/\S/, @updateFile);

# Create an array that will contain the modifications and the line
# prior to the first modification.
@modifications = ();

# Counter initialization
$i = 0;


# Loop the array to find out which lines are flagged as new and
# which lines immediately precede those
foreach $linha (@updateFile) {

# Remove \n characters
chomp($linha);

# Find the new lines flagged with #I
if ($linha =~ m/#I$/) {

    # Verify that the previous line is not flagged as updated.
    # If it is not, it means that the update starts here.
    unless ($updateFile[$i-1] =~ m/#I$/) {
        print "Line where the update starts $updateFile[$i-1]\n";

        # Add that line to the array modifications
        push(@modifications, $updateFile[$i-1]);

    } # END OF unless 

print "$updateFile[$i]\n";

# Add the lines tagged for insertion into the array
push(@modifications, $updateFile[$i]);

} # END OF if ($linha =~ m/#I$/)

# Increment the counter
$i = $i + 1;

} # END OF foreach $linha (@updateFile) 


foreach $modif (@modifications) {
    unless ($modif =~ m/#I$/) {
        foreach $original (@originalFile) {
            chomp($original);
            if ($original ne $modif) {
                push (@newOriginal, $originalFile[$n]);
            }
            elsif ($original eq $modif) { #&& $modif[$n+1] =~ m/#I$/) {
                push (@newOriginal, $originalFile[$n]);
                last;
            }
            $n = $n + 1;
        }
    }
    if ($modif =~ m/#I$/) {
        push (@newOriginal, $modifications[$m]);
    }
    $m = $m + 1;
}

The result obtained is almost the one I want but not yet.


Solution

  • I finally was able to come back to this issue and it seems I have been able to solve this. Probably not the best solution or "prettiest" but one that is doing what I need :) .

    # Open the file
    
    # First parameter is the file containing the update
    my ($FileUpdate) = $ARGV[0];
    
    # Second parameter is the file to be updated
    my ($FileOriginal) = $ARGV[1];
    
    
    # \s whitespace characters
    
    # Open both files and give them handles to be referred to further ahead
    open(UPD, $FileUpdate) || die("Could not open file $FileUpdate!");
    open(ORG, $FileOriginal) || die("Could not open file $FileOriginal!");
    
    # ------------------------------------------------ #
    # ---------------- ARRAY CREATION ---------------- #
    # ------------------------------------------------ #
    
    # Pass the content of the file $FileUpdate to the updateFile array
    @updateFile = <UPD>;
    
    # Pass the content of the file $FileOriginal to the originalFile array
    @originalFile = <ORG>;
    
    # Remove empty lines from the array contained on the updated file
    @updateFile = grep(/\S/, @updateFile);
    
    # Create an array that will contain the modifications and the line
    # prior to the first modification.
    @modifications = ();
    
    # Counter initialization
    $i = 0;
    
    
    # ------------------------------------------------ #
    # ----- LOOP TO IDENTIFY LINES FOR INSERTION ----- #
    # ------------------------------------------------ #
    
    # Loop the array to find out which lines are flagged as new and
    # which lines immediately precede those
    foreach $linha (@updateFile) {
    
    # Remove \n characters
    chomp($linha);
    
    # Find the new lines flagged with #I
    if ($linha =~ m/#I$/) {
    
        # Verify that the previous line is not flagged as updated.
        # If it is not, it means that the update starts here.
        unless ($updateFile[$i-1] =~ m/#I$/) {
    
            # Add that line to the array modifications
            push(@modifications, $updateFile[$i-1]);
    
        } # END OF unless 
    
    # Add the lines tagged for insertion into the array
    push(@modifications, $updateFile[$i]);
    
    } # END OF if ($linha =~ m/#I$/)
    
    # Increment the counter
    $i = $i + 1;
    
    } # END OF foreach $linha (@updateFile) 
    
    
    # ------------------------------------------------ #
    # --------- ADD VALUES TO MODIFICATIONS  --------- #
    # ------------------------------------------------ #
    foreach $valor (@modifications) {   
    print "$valor\n";
    }
    
    # ------------------------------------------------ #
    # -------------------- BACKUP -------------------- #
    # ------------------------------------------------ #
    
    # Make a backup copy from the original file   
    # in case something goes wrong when updating it
    
    # Obtain the current time
    $tt=localtime();
    use POSIX qw(strftime);
    $tt = strftime "%Y%m%d-%H%M\n", localtime;
    
    system("cp $FileOriginal $FileOriginal.$tt");
    
    # ------------------------------------------------ #
    # ------------- INSERT THE NEW LINES ------------- #
    # ------------------------------------------------ #
    
    # Counter initialization
    $m = 0;
    
    # New file array
    @newOriginal = ();
    
    # Goes through the original file and for each line not present in modifs, writes it .
    
    foreach $original (@originalFile) {
    # Initialize counter
    $n = 0;
    
    # Remove spaces
    chomp ($original);
    
    # Check if the value already exists on the array
    # If it doesnt, adds it
    if (grep {$_ eq $original} @newOriginal) {
    }
    else {
        push (@newOriginal, $originalFile[$m]); 
    }
    
    # Iterate over the array containing the modifications
    # These new lines shall be added to the final file.
    foreach $modif (@modifications) {
        # Remove spaces
        chomp ($modif);
    
        #print "Original: $original, Modif: $modif\n";
    
        # Initialize counter
        $k = 0;
    
        # Compare the current value from the original file with
        # the elements that exist on the modifications array.
        # If they are equal push that line in order to be added
        # to the results file.
        if ($original eq $modif) {
    
            # Increment the counter
            $k = $n+1;
    
            # Iterate the array with the modifications
            # in order to insert all lines that end with #I
            # immediately after the common line between files.
            foreach my $igual ($k..$#modifications) {
    
                # Remove spaces
                chomp($igual);
    
                # If the line ends with #I add it to the final file.
                if ($modifications[$igual] =~ m/#I$/) {
    
                    foreach $newO (@newOriginal) {
                        # Remove spaces
                        chomp($newO);
                        if ($newO ne $modifications[$igual]) {
                            push (@newOriginal, $modifications[$igual]);
                            last;
                        }
                    }
                }
                else {
                    last;
                }
            }
        }
    
        # Increment the counter
        $n = $n + 1;
    }
    # Increment the counter
    $m = $m + 1;
    }
    
    # ------------------------------------------------ #
    # ------------- RESULTS PRESENTATION ------------- #
    # ------------------------------------------------ #
    $v = 0;
    print "--------------------\n";
    foreach $vl (@newOriginal) {
    print "newOriginal: $newOriginal[$v]\n";
    $v = $v + 1;
    }
    print "--------------------\n";
    
    # ------------------------------------------------ #
    # ------------- CREATE UPDATED FILE -------------- #
    # ------------------------------------------------ #
    $v = 0;
    
    # Create the new name for the file - only for testing purposes now, it will be the original name afterwards
    $NewFileToWriteTo = $FileOriginal;
    # Retrieve the extension of the file to be updated
    my ($ext) = $FileOriginal =~ /(\.[^.]+)$/;
    # Remove the extension - just for testing purposes because I want to change the file name now
    $NewFileToWriteTo =~ s/$ext//;
    # Create the new file name by adding the suffix _tst and the correct extension to it.
    $NewFileToWriteTo = $NewFileToWriteTo . '_tst' . ${ext};
    
    
    # Create the new file or die in case it is not possible to open it
    open DAT, ">$NewFileToWriteTo" or die("Could not open file!");
    
    
    # Write to the new file. This will be the UPDATED version of the ORIGINAL file.
    foreach $vl (@newOriginal) {
    print DAT "$newOriginal[$v]\n";
    $v = $v + 1;
    }
    
    # Close all files
    close(DAT);
    close(UPD);
    close(ORG);