Search code examples
pythonregexstringpython-re

Regular Expression to remove selective string


Looking to remove particular string coming in between json string:

For Example my Json string is :

{"tableName":"avzConf","rows":[{"Comp":"mster","Conf": "[{\"name\": \"state\", \"dispN\": \"c_d_test\", \"\": {\"updated_at\": \"2020-09-16T06:33:07.684504Z\", \"updated_by\": \"Abc_xyz<abc_xyz@uuvvww.com>\"}}, {\"name\": \"stClu\", \"dNme\": \"tab(s) Updatedd\", \"\": {\"updated_at\": \"2020-09-21T10:17:48.307874Z\", \"updated_by\": \"Def Ghi<def_ghi@uuvvww.com>\"}}
}]
}

want to remove: \"\": {\"updated_at\": \"2020-09-16T06:33:07.684504Z\", \"updated_by\": \"Abc_xyz<abc_xyz@uuvvww.com>\"}

Expected output :

{"tableName":"avzConf","rows":[{"Comp":"mster","Conf": "[{\"name\": \"state\", \"dispN\": \"c_d_test\"}, {\"name\": \"stClu\", \"dNme\": \"tab(s) Updatedd\"}
}]
}

I tried with ( \\"\\": {\\"updated_\w+)(.*)(>\\")

used in my code:

import re

line = re.sub(r"updated_\w+(.*)(.com>)", '', json_str)

But it's also selecting the between lines as there is 2 occurrences of "": {"updated_at\ and "updated_by"

And leaving special char "": {""}

How can I completely remove \"\": {\"updated_at\": \"2020-09-16T06:33:07.684504Z\", \"updated_by\": \"Abc_xyz<abc_xyz@uuvvww.com>\"}?


Solution

  • With python json string I'm able to remove those unwanted fields as below: this has completely removed the unwanted empty key and replace the same with }, to complete the json perfectly.

    regex as \,\s\\\"\\\":\s\{\\\"updated_at[^{]+\}[^\]]
    
    json_str = str({"tableName":"avzConf","rows":[{"Comp":"mster","Conf": "[{"name": "state", "dispN": "c_d_test", "": {"updated_at": "2020-09-16T06:33:07.684504Z", "updated_by": "Abc_xyzabc_xyz@uuvvww.com"}}, {"name": "stClu", "dNme": "tab(s) Updatedd", "": {"updated_at": "2020-09-21T10:17:48.307874Z", "updated_by": "Def Ghidef_ghi@uuvvww.com"}} }] })
    
    import re
    line = re.sub(r"\,\s\\\"\\\":\s\{\\\"updated_at[^{]+\}",'},', json_str)