I am extracting value from "text" key from json string but can see recursion call is happening twice and getting output 2 times. Added multiple print statements for debugging purpose.
Need help in debugging process_data method where call is repeated twice.
def process_data(obj, content):
# texts = []
print("inside process data", obj)
if isinstance(obj, dict):
for key, value in obj.items():
print("key",key)
if key == "textValue":
print("processing textValue")
textval_str = json.dumps(value)
textval = json.loads(textval_str)
print("textVal", textval)
process_data(textval, content)
if key == "text":
print("processing text", value)
content.append(value)
return
else:
process_data(value, content)
elif isinstance(obj, list):
print("list")
for item in obj:
process_data(item, content)
return content
def from_adf(x):
try:
if x is not None:
cont = []
adf_text = process_data(x, cont)
print(len(cont))
return adf_text
else:
return None
except Exception as e:
print(e)
return x
if __name__ == '__main__':
print(1)
json_str = '''{"textValue": "{\"type\":null,\"content\":[{\"content\":[{\"type\":\"text\",\"text\":\"PAAT \"}]}]}"}'''
op = json_str.replace('"{', '{').replace('}"', '}').replace('\\"', '\\\\\\"')
opls = json.loads(op, strict=False)
output = from_adf(opls)
print(output)
Besides the fact that json_str
is not valid JSON (probably a misunderstanding how escaping works in string literals when you made this post), the problem of repetitive calls is located in your if..else
chain:
if key == "textValue":
...
if key == "text":
...
else:
This means the final else
block will also be executed when the first if
block was executed. This is not what you want. Change it to:
if key == "textValue":
...
elif key == "text":
...
else:
Some other remarks:
Your code should not do json.dumps
and then json.loads
of the returned value. This is useless as you already parsed the input as JSON. So that first if
case isn't really needed -- just deal with it in the else
block. On the other hand, if you expect the textValue
key to occur in deeper levels also, and expect it to have encoded JSON, then you should not call dumps
, but only loads
.
There should not be any tinkering with the JSON string (replacing backslashes) in your code. If there is a JSON-encoded string that is encoded again as JSON in the larger context, then just decode those parts in sequence without any other string manipulation.
Instead of collecting values in a list that is given as parameter, consider using a generator function for this purpose.
I suppose your real input has backslashes (nested JSON encoding), so use the r-string syntax (so the backslashes are taken as literals), and then continue as follows:
import json
def process_data(obj):
if isinstance(obj, dict):
for key, value in obj.items():
if key == "text":
yield value
else:
yield from process_data(value)
elif isinstance(obj, list):
for item in obj:
yield from process_data(item)
# example input is now valid JSON (using r-string syntax):
json_str = r'''{"textValue": "{\"type\":null,\"content\":[{\"content\":[{\"type\":\"text\",\"text\":\"PAAT \"}]}]}"}'''
# don't tinker with this string. Use only the JSON parser
opls = json.loads(json_str)
opls["textValue"] = json.loads(opls["textValue"])
# get result from generator, and convert to list
output = list(process_data(opls))
print(output)
Or, if you have more occurrences of nested JSON encoded strings, when the key is textValue
, then keep the first if
, but don't encode the value with dumps
-- you (only) need to decode it:
import json
def process_data(obj):
if isinstance(obj, dict):
for key, value in obj.items():
if key == "textValue": # assume the value is JSON encoded:
yield from process_data(json.loads(value))
elif key == "text":
yield value
else:
yield from process_data(value)
elif isinstance(obj, list):
for item in obj:
yield from process_data(item)
# example input is now valid JSON (using r-string syntax):
json_str = r'''{"textValue": "{\"type\":null,\"content\":[{\"content\":[{\"type\":\"text\",\"text\":\"PAAT \"}]}]}"}'''
# don't tinker with this string. Use only the JSON parser
opls = json.loads(json_str)
# get result from generator, and convert to list
output = list(process_data(opls))
print(output)