Search code examples
shellawkcommand-linesedhocon

Parse hocon file using shell script


I have one hocon configuration created from JSON file. I need to parse the following hocon and extract the values

sample hocon file: sample.json

    nodes=[
    {
        host=myhostname
        name=myhostname
        ports {
            # debug port
            debug=9384
            # http Port on which app running
            http=9380
            # https Port on which app running
            https=9381
            # JMX port
            jmx=9383
        }
        type=app
        vm-args=[
            "-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintClassHistogram",
            "-XX:+UseConcMarkSweepGC -XX:+UseParNewGC ",
            "-XX:+UseTLAB -XX:CMSInitiatingOccupancyFraction=80 -XX:+ExplicitGCInvokesConcurrent -verbose:gc",
            "-XX:SurvivorRatio=8 -XX:+UseNUMA -XX:TargetSurvivorRatio=80 -XX:MaxTenuringThreshold=15",
            "-Xmx3200m -Xms3200m -XX:NewSize=1664m -XX:MaxNewSize=1664m -Xss1024k",
            "-server"
        ]
    }
]
profile=java-dev
resources {
cfg-repository {
    branch-name=master
    commit-id=HEAD
    password=sigma123
    url="http://localhost:9890/gitcontainer/demo-cfg"
    username=sadmin
}
databases=[
    {
        connection-string="oracle03:1522:si12c"
        name=cm
        password=coresmp601
        username=coresmp601cm
    },
    {
        connection-string="oracle03:1522:si12c"
        name=am
        password=coresmp601
        username=coresmp601am
    }
]
idp {
    url="https://sohanb:8097/idp"
}
keystores=[
    {
        file-location="/home/smp/runtime/ssl"
        name=identity
        passphrase=kspass
    }
]
admin {
    password=sigma123
    url="http://punws-sohanb.net:9002/"
    username=sadmin
}
}

Now from this hocon file i want to extract the vm-args. I have tried different bash tools and sed/awk commands but no luck.

Please suggest!


Solution

  • awk to the rescue!

     $ awk 'p&&$0~/"/{gsub("\"","");print} /vm-args/{p=1} ' hoconfile
    
                -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintClassHistogram,
                -XX:+UseConcMarkSweepGC -XX:+UseParNewGC ,
                -XX:+UseTLAB -XX:CMSInitiatingOccupancyFraction=80 -XX:+ExplicitGCInvokesConcurrent -verbose:gc,
                -XX:SurvivorRatio=8 -XX:+UseNUMA -XX:TargetSurvivorRatio=80 -XX:MaxTenuringThreshold=15,
                -Xmx3200m -Xms3200m -XX:NewSize=1664m -XX:MaxNewSize=1664m -Xss1024k,
                -server
    

    from there you can format as desired.

    UPDATE based on the updated input file you need to terminate printing by additional logic add /]/{p=0} between the two blocks as in:

    $ awk 'p&&$0~/"/{gsub("\"","");print} /]/{p=0} /vm-args/{p=1}' file
    

    you can pipe the output to tr -d ',' | tr -s ' ' to remove commas and squeeze spaces, or do the same in the awk script.

    Explanation: a pattern match to "vm-args" sets the flag (p=1). If the flag is set and the line includes quotes print the line, if the line matches to close square brackets (]) set the flag off (p=0), so effectively stops if there are no more "vm-args" match in the file.

    UPDATE: I changed the code slightly, now concatenates the lines into one, searches for the hostname, trimming the extra chars are done with tr and sed.

    $ awk 'p && $0~/"/ {args=args $0 FS} 
           p && $0~/]/ {print args; exit} 
     /name=myhostname/ {h=1} 
        h && /vm-args/ {p=1}' file | 
     tr -d '",' | 
     tr -s ' ' | 
     sed 's/^ //'
    
    -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintClassHistogram -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseTLAB -XX:CMSInitiatingOccupancyFraction=80 -XX:+ExplicitGCInvokesConcurrent -verbose:gc -XX:SurvivorRatio=8 -XX:+UseNUMA -XX:TargetSurvivorRatio=80 -XX:MaxTenuringThreshold=15 -Xmx3200m -Xms3200m -XX:NewSize=1664m -XX:MaxNewSize=1664m -Xss1024k -server