Search code examples
regexperl

Put nested arrays on one line with regex


I would like to put the arrays of the following file on one line:

{
    "name": "John",

    "age": 30,

    "cars": [

    
        {
            "name": "Ford",
            "models": [
                "Fiesta",
                "Focus",
                "Mustang"
            ]
        },
        {
            "name": "BMW",
            "models": [
                "320",
                "X3",
                "X5"
            ]

        },

        {


            "name": "Fiat",
            "models": [
                "500",


                "Panda"
            ]
        }
    ]
}

I tried this regex:

s/:\s*\[\K\n.*?(?=\])/$&=~s@\s+@@rgs/egs

that allows me to put arrays on a single line, but it doesn't work for nested arrays. How can I modify it to handle nested arrays?


Solution

  • There are good libraries in Perl for work with JSON data, shown in the question. There is a core library JSON::PP, then there are the better JSON::XS and Cpanel::JSON::XS. They'll all read a JSON file/string nicely, and return a Perl data structure.

    Then you can pretty-print that as you like, again using libraries. Here are two examples.

    use warnings;
    use strict;
    use feature 'say';
    
    use Data::Dump qw(dumpf);
    use Path::Tiny;  # path()
    use Cpanel::JSON::XS; 
    use List::MoreUtils qw(none);  # also in List::Util in newer versions
    
    use Data::Printer filters => [ { 
        ARRAY => sub { 
            # Print arrays on one line if at "bottom" (no further references)
            if ( none { ref } @{$_[0]} ) {
                return join ', ', map { q(").$_.q(") } @{$_[0]}
            }
            else { return }  # or return `undef` to have it passed through
        },  
    }];
    
    my $file = shift // die "Usage: $0 ilename\n"; 
    
    my $json = path($file)->slurp; 
    my $ds = Cpanel::JSON::XS->new->decode($json); 
    
    say "w/ Data::Dump, using filters:";
    say dumpf( $ds, sub {
        my ($ctx, $obj) = @_; 
        return ($ctx->is_array and none { ref } @$obj)
            ? { object => "@$obj" }
            : undef;                 # no filtering, carry on normally
    });
    
    say "\nw/ Data::Printer:";     
    p( $ds, index => 0 );
    

    I normally use Data::Dump for its concise default output with dd and pp. It prints short arrays and hashes on one line. However, longer ones are broken up over lines for readability so here I use a filter. The none { ref } LIST returns true if none of the items in the list have a "true" ref return -- so none are references, but all are mere strings; the "bottom" of the data structure.

    That part of the code prints the data structure as

    {
      age  => 30,
      cars => [
                { models => "Fiesta Focus Mustang", name => "Ford" },
                { models => "320 X3 X5", name => "BMW" },
                { models => "500 Panda", name => "Fiat" },
              ],
      name => "John",
    }
    
    

    Even if those arrays were much longer they'd still be printed on one line.

    However, this still bears a particular format since Data::Dump needs to "serialize" data structures in such a way so that they can be read back in (and rebuilt) by perl. (But, once we customized how the array is printed this string cannot be read back into a data structure.)

    On the other hand, Data::Printer doesn't care for serialization or Perl data -- it is the tool to only show data for review. It has many features, and in particular one can write filters.

    So the use of it in the demo above is a take on what the question asks: print arrays on one line, and I assume only those that are at the "bottom" of a data structure, having no further references. That prints (also colorized, not seen here)

    {
        age    30,
        cars   [
            {
                models   "Fiesta", "Focus", "Mustang",
                name     "Ford"
            },
            {
                models   "320", "X3", "X5",
                name     "BMW"
            },
            {
                models   "500", "Panda",
                name     "Fiat"
            }
        ],
        name   "John"
    }
    

    This can be customized far more.

    I strongly suggest not to fiddle with printing details of specific data formats, like JSON, using regex; it's going to be very messy and unreliable, and mostly a waste of time. (Except in the simplest of cases, or to fine-tune a library's output...)


    And more for the record here's a regex that rewrites that JSON as wanted

    perl -0777 -wpE'
        s{\[ ([^[]+) \] }{ q([) . $1 =~ s/\s*\n+\s*/ /gr . q(]) }gex
    ' spread_out.json
    

    The pattern as it stands would be badly confused by a literal [ or ] in a string inside of an array.

    Note that this still leaves those empty lines. If you want those cleaned out as well add another regex for that, or first parse-and-rewrite the file, with jq for example.

    But again I don't recommend changing your data like this for a minor benefit in readability. I'd rather keep the data itself (JSON) as is (correct and safe) and parse it and then work with it using reliable dedicated tools, for instance like in the program above.

    I also do not recommend relying on this regex as it hasn't been tested enough and small changes in data might break it.