Search code examples
python-3.xpandasdataframeaws-step-functions

Create data frame from json step exaction history aws


I have got step function execution history in JSON format

[{
    "timestamp": "2022-07-18T13:03:03.346000+00:00",
    "type": "ExecutionFailed",
    "id": 3,
    "previousEventId": 2,
    "executionFailedEventDetails": {
        "error": "States.Runtime",
        "cause": "An error occurred while executing the state 'Workflow Choice state' (entered at the event id #2). Invalid path '$.contributor_id': The choice state's condition path references an invalid value."
    }
}, {
    "timestamp": "2022-07-18T13:03:03.306000+00:00",
    "type": "ChoiceStateEntered",
    "id": 2,
    "previousEventId": 0,
    "stateEnteredEventDetails": {
        "name": "Workflow Choice state",
        "input": "{\n    \"Comment\": \"Insert your JSON here\"\n}",
        "inputDetails": {
            "truncated": false
        }
    }
}, {
    "timestamp": "2022-07-18T13:03:03.252000+00:00",
    "type": "ExecutionStarted",
    "id": 1,
    "previousEventId": 0,
    "executionStartedEventDetails": {
        "input": "{\n    \"Comment\": \"Insert your JSON here\"\n}",
        "inputDetails": {
            "truncated": false
        },
        "roleArn": "arn:aws:iam::asdfg:role/step-all"
    }
}]

We want to create a view like below enter image description here

The issue is i am not able to create executionFailedEventDetails ,stateEnteredEventDetails,executionStartedEventDetails as new row . It comes in first row only .

Step column is the name in the stateEnteredEventDetails

This is what i am doing

import json

import pandas as pd
from tabulate import tabulate

raw = r"""[{
    "timestamp": "2022-07-18T13:03:03.346000+00:00",
    "type": "ExecutionFailed",
    "id": 3,
    "previousEventId": 2,
    "executionFailedEventDetails": {
        "error": "States.Runtime",
        "cause": "An error occurred while executing the state 'Workflow Choice state' (entered at the event id #2). Invalid path '$.contributor_id': The choice state's condition path references an invalid value."
    }
}, {
    "timestamp": "2022-07-18T13:03:03.306000+00:00",
    "type": "ChoiceStateEntered",
    "id": 2,
    "previousEventId": 0,
    "stateEnteredEventDetails": {
        "name": "Workflow Choice state",
        "input": "{\n    \"Comment\": \"Insert your JSON here\"\n}",
        "inputDetails": {
            "truncated": false
        }
    }
}, {
    "timestamp": "2022-07-18T13:03:03.252000+00:00",
    "type": "ExecutionStarted",
    "id": 1,
    "previousEventId": 0,
    "executionStartedEventDetails": {
        "input": "{\n    \"Comment\": \"Insert your JSON here\"\n}",
        "inputDetails": {
            "truncated": false
        },
        "roleArn": "arn:aws:iam::asdfg:role/step-all"
    }
}]"""


data = json.loads(raw, strict=False)
data = pd.json_normalize(data)
# print(data.to_csv(), index=False)
print(tabulate(data, headers='keys', tablefmt='psql'))
data.to_csv('file.csv',encoding='utf-8', index=False)

and the output is

+----+----------------------------------+--------------------+------+-------------------+-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+----------------------------------------+---------------------------------------------------+----------------------------------------+-------------------------------------------------------+----------------------------------------+
|    | timestamp                        | type               |   id |   previousEventId | executionFailedEventDetails.error   | executionFailedEventDetails.cause                                                                                                                                                                 | stateEnteredEventDetails.name   | stateEnteredEventDetails.input         |   stateEnteredEventDetails.inputDetails.truncated | executionStartedEventDetails.input     |   executionStartedEventDetails.inputDetails.truncated | executionStartedEventDetails.roleArn   |
|----+----------------------------------+--------------------+------+-------------------+-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+----------------------------------------+---------------------------------------------------+----------------------------------------+-------------------------------------------------------+----------------------------------------|
|  0 | 2022-07-18T13:03:03.346000+00:00 | ExecutionFailed    |    3 |                 2 | States.Runtime                      | An error occurred while executing the state 'Workflow Choice state' (entered at the event id #2). Invalid path '$.contributor_id': The choice state's condition path references an invalid value. | nan                             | nan                                    |                                               nan | nan                                    |                                                   nan | nan                                    |
|  1 | 2022-07-18T13:03:03.306000+00:00 | ChoiceStateEntered |    2 |                 0 | nan                                 | nan                                                                                                                                                                                               | Workflow Choice state           | {                                      |                                                 0 | nan                                    |                                                   nan | nan                                    |
|    |                                  |                    |      |                   |                                     |                                                                                                                                                                                                   |                                 |     "Comment": "Insert your JSON here" |                                                   |                                        |                                                       |                                        |
|    |                                  |                    |      |                   |                                     |                                                                                                                                                                                                   |                                 | }                                      |                                                   |                                        |                                                       |                                        |
|  2 | 2022-07-18T13:03:03.252000+00:00 | ExecutionStarted   |    1 |                 0 | nan                                 | nan                                                                                                                                                                                               | nan                             | nan                                    |                                               nan | {                                      |                                                     0 | arn:aws:iam::asdfg:role/step-all       |
|    |                                  |                    |      |                   |                                     |                                                                                                                                                                                                   |                                 |                                        |                                                   |     "Comment": "Insert your JSON here" |                                                       |                                        |
|    |                                  |                    |      |                   |                                     |                                                                                                                                                                                                   |                                 |                                        |                                                   | }                                      |                                                       |                                        |
+----+----------------------------------+--------------------+------+-------------------+-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+----------------------------------------+---------------------------------------------------+----------------------------------------+-------------------------------------------------------+----------------------------------------+

The Details column 5th column is dynamic like all columns currently i have given example of only 3 event but it can go up to any number .

Final Expected Output

enter image description here


Solution

  • Given file.json:

    [{
        "timestamp": "2022-07-18T13:03:03.346000+00:00",
        "type": "ExecutionFailed",
        "id": 3,
        "previousEventId": 2,
        "executionFailedEventDetails": {
            "error": "States.Runtime",
            "cause": "An error occurred while executing the state 'Workflow Choice state' (entered at the event id #2). Invalid path '$.contributor_id': The choice state's condition path references an invalid value."
        }
    }, {
        "timestamp": "2022-07-18T13:03:03.306000+00:00",
        "type": "ChoiceStateEntered",
        "id": 2,
        "previousEventId": 0,
        "stateEnteredEventDetails": {
            "name": "Workflow Choice state",
            "input": "{\n    \"Comment\": \"Insert your JSON here\"\n}",
            "inputDetails": {
                "truncated": false
            }
        }
    }, {
        "timestamp": "2022-07-18T13:03:03.252000+00:00",
        "type": "ExecutionStarted",
        "id": 1,
        "previousEventId": 0,
        "executionStartedEventDetails": {
            "input": "{\n    \"Comment\": \"Insert your JSON here\"\n}",
            "inputDetails": {
                "truncated": false
            },
            "roleArn": "arn:aws:iam::asdfg:role/step-all"
        }
    }]
    

    Doing

    import pandas as pd
    
    df = pd.read_json('file.json')
    df = df.melt(['timestamp', 'type', 'id', 'previousEventId'], var_name='step', value_name='details').dropna()
    print(df.to_markdown(index=False))
    

    Output (Markdown was just the easiest for me to show all):

    | timestamp                        | type               |   id |   previousEventId | step                         | details                                                                                                                                                                                                                                   |
    |:---------------------------------|:-------------------|-----:|------------------:|:-----------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
    | 2022-07-18 13:03:03.346000+00:00 | ExecutionFailed    |    3 |                 2 | executionFailedEventDetails  | {'error': 'States.Runtime', 'cause': "An error occurred while executing the state 'Workflow Choice state' (entered at the event id #2). Invalid path '$.contributor_id': The choice state's condition path references an invalid value."} |
    | 2022-07-18 13:03:03.306000+00:00 | ChoiceStateEntered |    2 |                 0 | stateEnteredEventDetails     | {'name': 'Workflow Choice state', 'input': '{\n    "Comment": "Insert your JSON here"\n}', 'inputDetails': {'truncated': False}}                                                                                                          |
    | 2022-07-18 13:03:03.252000+00:00 | ExecutionStarted   |    1 |                 0 | executionStartedEventDetails | {'input': '{\n    "Comment": "Insert your JSON here"\n}', 'inputDetails': {'truncated': False}, 'roleArn': 'arn:aws:iam::asdfg:role/step-all'}                                                                                            |