Search code examples
perlcode-duplication

Merging 2 nearly identical functions in perl


I have a script that gets some data from a website. The data comes in JSON format and the site offers an option to "flatten" the JSON output into a single JSON object, or leave it as multiple objects.

The script has options that allow for converting the JSON data to YAML (whether flattened or not), or leaving it in JSON format.

In addition the script colorizes the values in both formats.

In order to accomplish the coloring, I currently have 2 functions, one for JSON colorization, and one for YAML colorization.

Colorization itself is achieved with Term::ANSIColor, by searching and replacing the text in either a Scalar or an Array, depending on which output format the data is in.

I would like to get this down to one function in order to reduce the code duplication, but I am at a loss for how to accomplish this.

To be clear, the main focus of this question, for clarity's sake, is how to make one of the colorization functions reusable so that it can work on both the YAML and the JSON output. Because the search patterns are very very similar, and the replacement patterns are identical, I feel like it should be really easy to accomplish this but I'm drawing a blank on how to do it.

use JSON;
use YAML::Tiny;

sub colorize_yaml
{
    my $OUTPUT                                                                               = shift;
    my $OPTIONS                                                                              = shift;

    if (ref $OUTPUT eq 'SCALAR')
    {
        foreach (${$OUTPUT})
        {

            # Hide this if debugging is disabled, else show it and color it
            if (!$OPTIONS->{debug})
            {
                s{(statusCode|success|dataExist|verumModelObjectName):\ [a-zA-Z0-9]+\n}
{}gxms;
            }
            else
            {
            s{(statusCode|success|dataExist|verumModelObjectName):}
{$OPTIONS->{color} ? BOLD YELLOW $1 . ':', BOLD GREEN : $1 . ':'}gxmse;
            }

            # Colorize 5 segment flat output
            s{([a-zA-Z0-9]+:)([a-zA-Z0-9]+:)([a-zA-Z0-9]+:)([a-zA-Z0-9]+:)([a-zA-Z0-9]+:\ )}
{$OPTIONS->{color} ? BOLD CYAN $1, BOLD YELLOW $2, BOLD MAGENTA $3, BOLD RED $4, RESET $5: $1 . $2 . $3 . $4 . $5}gxmse;

            # Colorize 4 segment flat output
            s{([a-zA-Z0-9]+:)([a-zA-Z0-9]+:)([a-zA-Z0-9]+:)([a-zA-Z0-9]+:\ )}
{$OPTIONS->{color} ? BOLD CYAN $1, BOLD YELLOW $2, BOLD MAGENTA $3, RESET $4 : $1 . $2 . $3 . $4}gxmse;

            # Colorize 3 segment flat output
            s{([a-zA-Z0-9]+:)([a-zA-Z0-9]+:)([a-zA-Z0-9]+:\ )}
{$OPTIONS->{color} ? BOLD CYAN $1, BOLD YELLOW $2, RESET $3 : $1 . $2 . $3}gxmse;

            # Colorize 2 segment flat output
            s{([a-zA-Z0-9]+:)([a-zA-Z0-9]+:\ )}
{$OPTIONS->{color} ? BOLD CYAN $1, RESET $2 : $1 . $2}gxmse;

            # Colorize values in all output
            s{(:\ )}
{$OPTIONS->{color} ? $1 . BOLD GREEN : $1}gxmse;

            # Reset colors before newlines so that the next line starts with a clean color pattern.
            s{\n}
{$OPTIONS->{color} ? RESET "\n" : "\n"}gxmse;
        }
    }
    else
    {
        pretty_print_error("WARNING: Unable to colorize YAML output\n", $OPTIONS->{color});
        return;
    }

    return;
}

sub colorize_json
{
    my $OUTPUT                                                                               = shift;
    my $OPTIONS                                                                              = shift;

    if (ref $OUTPUT eq 'ARRAY')
    {
        foreach (@{$OUTPUT})
        {
            if ($OPTIONS->{debug})
            {
                s{(statusCode|success|dataExist|verumModelObjectName):}
{$OPTIONS->{color} ? BOLD YELLOW $1 . ':', BOLD GREEN : $1 . ':'}gxmse;
            }
            else
            {
                s{(statusCode|success|dataExist|verumModelObjectName):\ [a-zA-Z0-9]+\n}
{}gxms;
            }

            # Colorize 5 segment flat output
            s{^([a-zA-Z0-9]+:)([a-zA-Z0-9]+:)([a-zA-Z0-9]+:)([a-zA-Z0-9]+:)([a-zA-Z0-9]+:\ .*$)}
{$OPTIONS->{color} ? BOLD CYAN $1, BOLD YELLOW $2, BOLD MAGENTA $3, BOLD RED, $4, RESET $5: $1 . $2 . $3 . $4 . $5}gxmse;

            # Colorize 4 segment flat output
            s{^([a-zA-Z0-9]+:)([a-zA-Z0-9]+:)([a-zA-Z0-9]+:)([a-zA-Z0-9]+:\ )}
{$OPTIONS->{color} ? BOLD CYAN $1, BOLD YELLOW $2, BOLD MAGENTA $3, RESET $4 : $1 . $2 . $3 . $4}gxmse;

            # Colorize 3 segment flat output
            s{^([a-zA-Z0-9]+:)([a-zA-Z0-9]+:)([a-zA-Z0-9]+:\ )}
{$OPTIONS->{color} ? BOLD CYAN $1, BOLD YELLOW $2, RESET $3 : $1 . $2 . $3}gxmse;

            # Colorize 2 segment flat output
            s{^([a-zA-Z0-9]+:)([a-zA-Z0-9]+:\ )}
{$OPTIONS->{color} ? BOLD CYAN $1, RESET $2 : $1 . $2}gxmse;

            # Colorize values in all output
            s{(:\ )}
{$OPTIONS->{color} ? $1 . BOLD GREEN : $1}gxmse;

            # Reset colors before newlines so that the next line starts with a clean color pattern.
            s{$}
{$OPTIONS->{color} ? RESET '' : ''}gxmse;
        }
    }
    else
    {
        pretty_print_error("WARNING: Unable to colorize JSON output.\n", $OPTIONS->{color});
        return;
    }

    return;
}

JSON converted to YAML

---
message: Success
ObjectList:
  -
    assetName: xxxxxxxx
    backupAsset:
      -
        backupFlag: xxxxxxxx
        fullyCertified: xxxxxxxx

Flattened JSON converted to YAML

---
message: Success
verumObjectList:
  -
    assetName: xxxxxxxx
    backupAsset:backupFlag: xxxxxxxx
    backupAsset:fullyCertified: xxxxxxxx

JSON (the data in JSON format is stripped by the script to make it plain text)

assetName: xxxxxxxx
backupFlag: xxxxxxxx
fullyCertified: xxxxxxxx
message: Success

Flattened JSON (the data in JSON format is stripped by the script to make it plain text)

assetName: xxxxxxxx
backupAsset:backupFlag: xxxxxxxx
backupAsset:fullyCertified: xxxxxxxx
message: Success

The correct answer is awarded to @zdim though I did have to tweak the code slightly.

I'm posting my updated code below.

use JSON;
use YAML::Tiny;

sub colorize_output
{
    my $OUTPUT   = shift;
    my $OPTIONS  = shift;

    my $RE_START = $EMPTY;
    my $RE_END   = q{\ };

    if (ref $OUTPUT eq $EMPTY)
    {   
        pretty_print_error("WARNING: Unable to colorize output.\n", 
            $OPTIONS->{color});
        return;
    }   
    elsif (ref $OUTPUT eq 'ARRAY')
    {   
        $RE_START = q{^};
        $RE_END   = q{\ .*};
    }   

    my $ANCHOR    = q{[a-zA-Z0-9]+:};
    my $PATTERN   = qq{($ANCHOR)};

    Readonly my $SEGMENT_LIMIT => 4;

    my $VERUM_RE = qr{(statusCode|success|dataExist|verumModelObjectName):}xms;

    my ($SEGMENT_2PART_RE, $SEGMENT_3PART_RE, $SEGMENT_4PART_RE, $SEGMENT_5PART_RE)
        = map { 
            qr{$RE_START}xms . ($PATTERN x $ARG) . qr{($ANCHOR$RE_END)}xms 
        } 1..$SEGMENT_LIMIT;

    foreach ((ref $OUTPUT eq 'SCALAR')?${$OUTPUT}:@{$OUTPUT})
    {   

        # Hide this if debugging is disabled, else show it and color it
        if (!$OPTIONS->{debug})
        {   
            s{$VERUM_RE\ [a-zA-Z0-9]+}{}gxms;
        }   
        else
        {   
            s{$VERUM_RE}
             {$OPTIONS->{color} ? BOLD YELLOW $1 . ':', BOLD GREEN : $1 . ':'}gxmse;
        }   

        # Colorize sections in flat output
        if ($OPTIONS->{color})
        {   
            s{$SEGMENT_5PART_RE}
             {BOLD CYAN $1, BOLD YELLOW $2, BOLD MAGENTA $3, BOLD RED $4, RESET $5}gxmse;
            s{$SEGMENT_4PART_RE}
             {BOLD CYAN $1, BOLD YELLOW $2, BOLD MAGENTA $3, RESET $4}gxmse;
            s{$SEGMENT_3PART_RE}
             {BOLD CYAN $1, BOLD YELLOW $2, RESET $3}gxmse;
            s{$SEGMENT_2PART_RE}
             {BOLD CYAN $1, RESET $2}gxmse;

            # Colorize values in all output
            s{(:\ )}{$1 . BOLD GREEN}gxmse;

            # Reset colors before newlines or next entry in the list so that
            # the next line starts with a clean color pattern.
            s{(\n|$)}{RESET $1}gxmse;
        }   
    }   

    return;
}   

Solution

  • This answers the question of how to refactor these functions, without any broader context.

    One difference is the input: it is either a scalarref or an arrayref.

    The more involved two other differences are in the regex: arrayref patterns are anchored and their last alphanumeric pattern ends with \ .*$, while scalarref ones are not anchored and their last match ends with an escaped space.

    Finally, if $OPTIONS->{color} is false then the whole pattern is replaced by itself, in all cases; so the variable doesn't change. Then the condition should be pulled out.

    sub colorize_yaml_json {
        my ($OUTPUT, $OPTIONS) = @_;
    
        my $anchor = '';
        my $last   = qr{\ };
        my @iter_refs;
    
        if    (ref $OUTPUT eq 'SCALAR') { @iter_refs = $$OUTPUT }
        elsif (ref $OUTPUT eq 'ARRAY')  { 
            @iter_refs = @$OUTPUT;
            $anchor = qr{^};
            $last   = qr{\ .*$};
        }
        else {
            pretty_print_error(...);
            return;
        }
    
        my $anc  = qr{[a-zA-Z0-9]+:};  # alphanumeric with colon
        my $patt = qr{($anc)};
    
        my ($seg2_re, $seg3_re, $seg4_re, $seg5_re) = map { 
            qr/$anchor/ . ($patt x $_) . qr/($anc$last)/ 
        } 1..4;
    
        foreach (@iter_refs) {
            if ($OPTIONS->{debug}) {
                ...
            }        
            if ($OPTIONS->{color}) {
                s{$seg5_re}{BOLD CYAN $1, ... }gxmse;
                ...
            }
        }
        return 1;
    }
    

    The map assembles the whole pattern for the four cases, by stacking the alphanumeric (with :) pattern $patt the needed 2–5 times, using x N for 1..4 and then appending the last pattern.

    The unpleasant complication is that each of base patterns $anc need be captured.

    I could only test this on my mock-up data so please check carefully (as always!).

    There is also the other issue of how to best handle the whole scenario, but that wasn't the question and so there isn't enough information to deal with it without much guessing.