Search code examples
bashjqamazon-emrash

How to create JSON Array in Bash


I want to create JSON array for emr steps. I have created the array for single json string. Here is my bash code -

export source="s3a://sourcebucket"
export destination="s3a://destinationbucket"

EMR_DISTCP_STEPS=$( jq -n \
                  --arg source "$source" \
                  --arg destination "$destination" \
                  '{
                    "Name":"S3DistCp step",
                    "HadoopJarStep": {
                    "Args":["s3-dist-cp","--s3Endpoint=s3.amazonaws.com", "'"--src=${sourcepath}"'" ,"'"--dest=${destinationpath}"'"],
                    "Jar":"command-runner.jar"
                    },
                     "ActionOnFailure":"CONTINUE"
                   }' )

output

echo $EMR_DISTCP_STEPS

[{ "Name": "S3DistCp step", "HadoopJarStep": { "Args": [ "s3-dist-cp", "--s3Endpoint=s3.amazonaws.com", "--src=s3a://sourcebucket", "--dest=s3a://destinationbucket" ], "Jar": "command-runner.jar" }, "ActionOnFailure": "CONTINUE" }]

Now I want to create JSON array with multiple source and destination output

[{ "Name": "S3DistCp step", "HadoopJarStep": { "Args": [ "s3-dist-cp", "--s3Endpoint=s3.amazonaws.com", "--src=s3a://sourcebucket1", "--dest=s3a://destinationbucket1" ], "Jar": "command-runner.jar" }, "ActionOnFailure": "CONTINUE" },
{ "Name": "S3DistCp step", "HadoopJarStep": { "Args": [ "s3-dist-cp", "--s3Endpoint=s3.amazonaws.com", "--src=s3a://sourcebucket2", "--dest=s3a://destinationbucket2" ], "Jar": "command-runner.jar" }, "ActionOnFailure": "CONTINUE" },
{ "Name": "S3DistCp step", "HadoopJarStep": { "Args": [ "s3-dist-cp", "--s3Endpoint=s3.amazonaws.com", "--src=s3a://sourcebucket3", "--dest=s3a://destinationbucket3" ], "Jar": "command-runner.jar" }, "ActionOnFailure": "CONTINUE" }]

How to generate JSON Array with multiple sources and destinations (JSON string) in Bash?


Solution

  • One way to do this is to provide a jq function that generates your repeated structure, given the specific inputs you want to modify. Consider the following:

    # generate this however you want to -- hardcoded, built by a loop, whatever.
    source_dest_pairs=(
      sourcebucket1:destinationbucket1
      sourcebucket2:destinationbucket2
      sourcebucket3:destinationbucket3
    )
    
    # -R accepts plain text, not JSON, as input; -n doesn't read any input automatically
    # ...but instead lets "inputs" or "input" be used later in your jq code.
    jq -Rn '
      def instructionsForPair($source; $dest): {
        "Name":"S3DistCp step",
        "HadoopJarStep": {
          "Args":[
            "s3-dist-cp",
            "--s3Endpoint=s3.amazonaws.com",
            "--src=\($source)",
            "--dest=\($dest)"
          ],
          "Jar":"command-runner.jar"
        }
      };
    
      [ inputs 
      | capture("^(?<source>[^:]+):(?<dest>.*)$"; "")
      | select(.)
      | instructionsForPair(.source; .dest) ]
    ' < <(printf '%s\n' "${source_dest_pairs[@]}")
    

    ...correctly emits as output:

    [
      {
        "Name": "S3DistCp step",
        "HadoopJarStep": {
          "Args": [
            "s3-dist-cp",
            "--s3Endpoint=s3.amazonaws.com",
            "--src=sourcebucket1",
            "--dest=destinationbucket1"
          ],
          "Jar": "command-runner.jar"
        }
      },
      {
        "Name": "S3DistCp step",
        "HadoopJarStep": {
          "Args": [
            "s3-dist-cp",
            "--s3Endpoint=s3.amazonaws.com",
            "--src=sourcebucket2",
            "--dest=destinationbucket2"
          ],
          "Jar": "command-runner.jar"
        }
      },
      {
        "Name": "S3DistCp step",
        "HadoopJarStep": {
          "Args": [
            "s3-dist-cp",
            "--s3Endpoint=s3.amazonaws.com",
            "--src=sourcebucket3",
            "--dest=destinationbucket3"
          ],
          "Jar": "command-runner.jar"
        }
      }
    ]