replace characters in a json file

I've done some wrong manipulation of a 100 json files. Not sure what happened, but most of my json files now have a random number of the last characters repeated (as per image below). Is there a way to clean a json file by deleting characters starting from the last one, until the json file has returned into a clean json format ?

Solution

You can use regular expressions. An alternative would be string manipulation, but in this case regex is quicker to write, especially for one-time-use code.

import re

files = ['a.json','b.json',...] # populate as needed

for filename in files:
    with open(filename,'r') as file:
        content = file.read()
    
    new_content = re.match('([\s\S]+\}\]\}\})[\s\S]+?',content).group(1)
    
    with open(filename,'w') as file:
        file.write(new_content)

This regex has several parts. [\s\S] matches all characters (whereas . would not match newlines and some other characters). The greedy [\s\S]+ matches as much as possible, and the lazy [\s\S]+? matches as little as possible (in this case, the trailing text we don't want).

We then parenthesise the part we do want to keep, ([\s\S]+\}\]\}\}), and extract that using .group(1) and write this to the file.

For more information, see Reference - What does this regex mean?, and in future I would suggest manipulating JSON using the builtin json library.