Search code examples
pythonjsonrecursion

Python recursion getting called twice


I am extracting value from "text" key from json string but can see recursion call is happening twice and getting output 2 times. Added multiple print statements for debugging purpose.

Need help in debugging process_data method where call is repeated twice.

def process_data(obj, content):
    # texts = []
    print("inside process data", obj)
    if isinstance(obj, dict):
        for key, value in obj.items():
            print("key",key)
            if key == "textValue":
                print("processing textValue")
                textval_str = json.dumps(value)
                textval = json.loads(textval_str)
                print("textVal", textval)
                process_data(textval, content)
            if key == "text":
                print("processing text", value)
                content.append(value)
                return
            else:
                process_data(value, content)
    elif isinstance(obj, list):
        print("list")
        for item in obj:
            process_data(item, content)
    return content

def from_adf(x):
    try:
        if x is not None:            
            cont = []
            adf_text = process_data(x, cont)
            print(len(cont))
            return adf_text
        else:
            return None
    except Exception as e:
        print(e)
        return x

if __name__ == '__main__':
    print(1)
    json_str = '''{"textValue": "{\"type\":null,\"content\":[{\"content\":[{\"type\":\"text\",\"text\":\"PAAT \"}]}]}"}'''
    op =  json_str.replace('"{', '{').replace('}"', '}').replace('\\"', '\\\\\\"')
    opls = json.loads(op, strict=False)
    output = from_adf(opls)
    print(output)

Solution

  • Besides the fact that json_str is not valid JSON (probably a misunderstanding how escaping works in string literals when you made this post), the problem of repetitive calls is located in your if..else chain:

                if key == "textValue":
                    ...
                if key == "text":
                    ...
                else:
    
    

    This means the final else block will also be executed when the first if block was executed. This is not what you want. Change it to:

                if key == "textValue":
                    ...
                elif key == "text":
                    ...
                else:
    

    Some other remarks:

    • Your code should not do json.dumps and then json.loads of the returned value. This is useless as you already parsed the input as JSON. So that first if case isn't really needed -- just deal with it in the else block. On the other hand, if you expect the textValue key to occur in deeper levels also, and expect it to have encoded JSON, then you should not call dumps, but only loads.

    • There should not be any tinkering with the JSON string (replacing backslashes) in your code. If there is a JSON-encoded string that is encoded again as JSON in the larger context, then just decode those parts in sequence without any other string manipulation.

    • Instead of collecting values in a list that is given as parameter, consider using a generator function for this purpose.

    I suppose your real input has backslashes (nested JSON encoding), so use the r-string syntax (so the backslashes are taken as literals), and then continue as follows:

    import json
    
    def process_data(obj):
        if isinstance(obj, dict):
            for key, value in obj.items():
                if key == "text":
                    yield value
                else:
                    yield from process_data(value)
        elif isinstance(obj, list):
            for item in obj:
                yield from process_data(item)
    
    # example input is now valid JSON (using r-string syntax):
    json_str = r'''{"textValue": "{\"type\":null,\"content\":[{\"content\":[{\"type\":\"text\",\"text\":\"PAAT \"}]}]}"}'''
    # don't tinker with this string. Use only the JSON parser
    opls = json.loads(json_str)
    opls["textValue"] = json.loads(opls["textValue"])
    # get result from generator, and convert to list
    output = list(process_data(opls))
    print(output)
    

    Or, if you have more occurrences of nested JSON encoded strings, when the key is textValue, then keep the first if, but don't encode the value with dumps -- you (only) need to decode it:

    import json
    
    def process_data(obj):
        if isinstance(obj, dict):
            for key, value in obj.items():
                if key == "textValue":  # assume the value is JSON encoded:
                    yield from process_data(json.loads(value))
                elif key == "text":
                    yield value
                else:
                    yield from process_data(value)
        elif isinstance(obj, list):
            for item in obj:
                yield from process_data(item)
    
    # example input is now valid JSON (using r-string syntax):
    json_str = r'''{"textValue": "{\"type\":null,\"content\":[{\"content\":[{\"type\":\"text\",\"text\":\"PAAT \"}]}]}"}'''
    # don't tinker with this string. Use only the JSON parser
    opls = json.loads(json_str)
    # get result from generator, and convert to list
    output = list(process_data(opls))
    print(output)