Search code examples
pythonjsondictionarycmdline-args

JSON to String to JSON Python


I have a workflow where output from one process is input to the next.

Process A outputs a JSON.

Process B inputs needs to be a JSON.

However, since I pass the JSON as a command-line argument, it becomes a string.

This command below is not in my control. It is autogenerated by Nextflow and so I need to find a solution (need not be JSON) but I need to access these values (keeping in mind this is essentially just a string)

python3.7 typing.py '{'id': '3283', 'code': '1234', 'task': '66128b3b-3440-4f71-9a6b-c788bc9f5d2c'}'

typing.py

def download_this(task_as_string):
    print("Normal")
    print(task_as_string)

    first = json.dumps(task_as_string)
    print("after json.dumps")
    print(first)

    second = json.loads(first)
    print("after json.loads")
    print(second)
    print(type(second))

if __name__ == "__main__":
    download_this(sys.argv[1])

I thought doing a json.dumps and then a json.loads would make it work, but it does not work.

Output

Normal
{id: 3283, code: 1234, task: 66128b3b-3440-4f71-9a6b-c788bc9f5d2c}
after json.dumps
"{id: 3283, code: 1234, task: 66128b3b-3440-4f71-9a6b-c788bc9f5d2c}"
after json.loads
{id: 3283, code: 1234, task: 66128b3b-3440-4f71-9a6b-c788bc9f5d2c}
<class 'str'>

And if I do print(second["task"]) I get a string indices must be integers

Traceback (most recent call last):
  File "typing.py", line 78, in <module>
    download_this(sys.argv[1])
  File "typing.py", line 55, in download_typed_hla
    print(second["task"])    
TypeError: string indices must be integers

So it was never converted to a dict in the first place. Any ideas how I can get around this problem?


Solution

  • A couple things:

    1. Your JSON is not properly formatted. Keys and values need to be enclosed by double quotes.
    2. You are passing in a stringified version of the JSON. Then you stringify it further before trying to load it. Just load it directly.
    def download_this(task_as_string):
        print("Normal")
        print(task_as_string)
    
        second = json.loads(task_as_string)
        print("after json.loads")
        print(second)
        print(type(second))
    
    download_this('{"id": "3283", "code": "1234", "task": "66128b3b-3440-4f71-9a6b-c788bc9f5d2c"}')
    
    Normal
    {"id": "3283", "code": "1234", "task": "66128b3b-3440-4f71-9a6b-c788bc9f5d2c"}
    after json.loads
    {'id': '3283', 'code': '1234', 'task': '66128b3b-3440-4f71-9a6b-c788bc9f5d2c'}
    <class 'dict'>
    

    To get around your input problem, provided that you trust the input from Nextflow to conform to a simple dictionary-like structure, you could do something like this:

    d = dict()
    for group in task_as_string.replace('{', '').replace('}', '').split(','):
        l = group.split(':')
        d[l[0].strip()] = l[1].strip()
    
    print(d)
    print(type(d))
    
    python3 typing.py '{'id': '3283', 'code': '1234', 'task': '66128b3b-3440-4f71-9a6b-c788bc9f5d2c'}'                      [12:03:11]
    {'id': '3283', 'code': '1234', 'task': '66128b3b-3440-4f71-9a6b-c788bc9f5d2c'}
    <class 'dict'>
    

    If the JSON coming from Nextflow is more complicated (i.e. with nesting and/or lists), then you'll have to come up with a more suitable parsing mechanism.