Search code examples
jsonbashawkscopebsd

problem with my custom function `json_stringify` in `awk`


I've written a shell function that converts an UTF-8 encoded string to a JSON string, using awk.

json_stringify() {

    LANG=C awk '

        BEGIN {
            for ( i = 1; i < ARGC; i++ )
                print json_stringify(ARGV[i])
        }

        function json_stringify( str, _str, _out ) {

            if( ! ("\\" in _ESC_) )
                for ( i = 1; i <= 127; i++ )
                    _ESC_[ sprintf( "%c", i) ] = sprintf( "\\u%04x", i )

            _str = str
            _out = "\""

            while ( match( _str, /[\"\\[:cntrl:]]/ ) ) {
                _out = _out substr(_str,1,RSTART-1) _ESC_[substr(_str,RSTART,RLENGTH)]
                _str = substr( _str, RSTART + RLENGTH )
            }

            return _out _str "\""
        }
    ' "$@"
}

It feels like I missed something trivial, because when I run (in bash):

json_stringify 'A"B' 'C\D' $'\b \f \t \r \n'

I get:

"A\u0022B"

while my expected output is:

"A\u0022B"
"C\u005cD"
"\u0008 \u000c \u0009 \u000d \u000a"

What could be the problem(s) in my code?


Solution

  • One issue I see is the dual use of i as a loop variable in both the BEGIN/for loop and the function/for loop, and because i is not declared as 'local' in the function you end up with just one instance of i in use for the entire script. Net result is the function is pushing i out to 127 which is well beyond ARGC so the BEGIN block only loops once (i=1) because on the 2nd loop i=127.

    Two possible fixes:

    declare i as local to the function, eg:

    function json_stringify( str, _str, _out, i ) {
    

    or use a different loop variable (eg, j) in one of the loops, eg:

    # in the BEGIN block:
    
    for ( j = 1; j < ARGC; j++ )
        print json_stringify(ARGV[j])
    
    # or in the function:
    
    for ( j = 1; j <= 127; j++ )
         _ESC_[ sprintf( "%c", j) ] = sprintf( "\\u%04x", j )
    

    Testing each of the possible fixes allows me to generate:

    "A\u0022B"
    "C\u005cD"
    "\u0008 \u000c \u0009 \u000d \u000a"
    

    Controlling Variable Scope - brief discussion on this topic.