I created a state machine to run some Glue/ETL jobs in parallel. I'm experimenting the Map state to take advantage of Dynamic parallelism. Here is the step function definition:
{
"StartAt": "Map",
"States": {
"Map": {
"Type": "Map",
"InputPath": "$.data",
"ItemsPath": "$.array",
"MaxConcurrency": 2,
"Iterator": {
"StartAt": "glue job",
"States": {
"glue Job": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"End": true,
"Parameters": {
"JobName": "glue-etl-job",
"Arguments": {
"--db": "db-dev",
"--file": "$.file",
"--bucket": "$.bucket"
}
}
}
}
},
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "NotifyError"
}
],
"Next": "NotifySuccess"
},
}
}
The input format that been passed to the step function is like this:
{
"data": {
"array": [
{"file": "path-to-file1", "bucket": "bucket-name1"},
{"file": "path-to-file2", "bucket": "bucket-name2"},
]
}
}
The problem is the file
and bucket
job arguments don't get resolved and they are being passed to the glue job like $.file
and $.bucket
. How can I pass the argument actual values from the input?
You need to add in the '.$' end of the parameter when using state field for parameter.
"--file.$": "$.file",
"--bucket.$": "$.bucket"
For complete guide check out the spec sheet. https://states-language.net/spec.html#parameters