I want to create JSON array for emr steps. I have created the array for single json string. Here is my bash code -
export source="s3a://sourcebucket"
export destination="s3a://destinationbucket"
EMR_DISTCP_STEPS=$( jq -n \
--arg source "$source" \
--arg destination "$destination" \
'{
"Name":"S3DistCp step",
"HadoopJarStep": {
"Args":["s3-dist-cp","--s3Endpoint=s3.amazonaws.com", "'"--src=${sourcepath}"'" ,"'"--dest=${destinationpath}"'"],
"Jar":"command-runner.jar"
},
"ActionOnFailure":"CONTINUE"
}' )
output
echo $EMR_DISTCP_STEPS
[{ "Name": "S3DistCp step", "HadoopJarStep": { "Args": [ "s3-dist-cp", "--s3Endpoint=s3.amazonaws.com", "--src=s3a://sourcebucket", "--dest=s3a://destinationbucket" ], "Jar": "command-runner.jar" }, "ActionOnFailure": "CONTINUE" }]
Now I want to create JSON array with multiple source and destination output
[{ "Name": "S3DistCp step", "HadoopJarStep": { "Args": [ "s3-dist-cp", "--s3Endpoint=s3.amazonaws.com", "--src=s3a://sourcebucket1", "--dest=s3a://destinationbucket1" ], "Jar": "command-runner.jar" }, "ActionOnFailure": "CONTINUE" },
{ "Name": "S3DistCp step", "HadoopJarStep": { "Args": [ "s3-dist-cp", "--s3Endpoint=s3.amazonaws.com", "--src=s3a://sourcebucket2", "--dest=s3a://destinationbucket2" ], "Jar": "command-runner.jar" }, "ActionOnFailure": "CONTINUE" },
{ "Name": "S3DistCp step", "HadoopJarStep": { "Args": [ "s3-dist-cp", "--s3Endpoint=s3.amazonaws.com", "--src=s3a://sourcebucket3", "--dest=s3a://destinationbucket3" ], "Jar": "command-runner.jar" }, "ActionOnFailure": "CONTINUE" }]
How to generate JSON Array with multiple sources and destinations (JSON string) in Bash?
One way to do this is to provide a jq
function that generates your repeated structure, given the specific inputs you want to modify. Consider the following:
# generate this however you want to -- hardcoded, built by a loop, whatever.
source_dest_pairs=(
sourcebucket1:destinationbucket1
sourcebucket2:destinationbucket2
sourcebucket3:destinationbucket3
)
# -R accepts plain text, not JSON, as input; -n doesn't read any input automatically
# ...but instead lets "inputs" or "input" be used later in your jq code.
jq -Rn '
def instructionsForPair($source; $dest): {
"Name":"S3DistCp step",
"HadoopJarStep": {
"Args":[
"s3-dist-cp",
"--s3Endpoint=s3.amazonaws.com",
"--src=\($source)",
"--dest=\($dest)"
],
"Jar":"command-runner.jar"
}
};
[ inputs
| capture("^(?<source>[^:]+):(?<dest>.*)$"; "")
| select(.)
| instructionsForPair(.source; .dest) ]
' < <(printf '%s\n' "${source_dest_pairs[@]}")
...correctly emits as output:
[
{
"Name": "S3DistCp step",
"HadoopJarStep": {
"Args": [
"s3-dist-cp",
"--s3Endpoint=s3.amazonaws.com",
"--src=sourcebucket1",
"--dest=destinationbucket1"
],
"Jar": "command-runner.jar"
}
},
{
"Name": "S3DistCp step",
"HadoopJarStep": {
"Args": [
"s3-dist-cp",
"--s3Endpoint=s3.amazonaws.com",
"--src=sourcebucket2",
"--dest=destinationbucket2"
],
"Jar": "command-runner.jar"
}
},
{
"Name": "S3DistCp step",
"HadoopJarStep": {
"Args": [
"s3-dist-cp",
"--s3Endpoint=s3.amazonaws.com",
"--src=sourcebucket3",
"--dest=destinationbucket3"
],
"Jar": "command-runner.jar"
}
}
]