Search code examples
pythonjsonformatting

replace characters in a json file


I've done some wrong manipulation of a 100 json files. Not sure what happened, but most of my json files now have a random number of the last characters repeated (as per image below). Is there a way to clean a json file by deleting characters starting from the last one, until the json file has returned into a clean json format ?

enter image description here

enter image description here

enter image description here


Solution

  • You can use regular expressions. An alternative would be string manipulation, but in this case regex is quicker to write, especially for one-time-use code.

    import re
    
    files = ['a.json','b.json',...] # populate as needed
    
    for filename in files:
        with open(filename,'r') as file:
            content = file.read()
        
        new_content = re.match('([\s\S]+\}\]\}\})[\s\S]+?',content).group(1)
        
        with open(filename,'w') as file:
            file.write(new_content)
    

    This regex has several parts. [\s\S] matches all characters (whereas . would not match newlines and some other characters). The greedy [\s\S]+ matches as much as possible, and the lazy [\s\S]+? matches as little as possible (in this case, the trailing text we don't want).

    We then parenthesise the part we do want to keep, ([\s\S]+\}\]\}\}), and extract that using .group(1) and write this to the file.

    For more information, see Reference - What does this regex mean?, and in future I would suggest manipulating JSON using the builtin json library.