How to modify JSON files inside a zip file using shell script/ ANT script/ java code/ python code? which of these is efficient and easy?

I have a requirement to modify the json content inside json files which are inside a zip file.

The hierarchy of zip file is:

abc.zip

folder1
dbconfig
- config1.xml
- config2.xml
documentsfolder
- jsonobjects.zip
  - JSONfolder
    - all the json files which contains json objects

I need to modify the json objects with a different format than the existing json format.

The existing JSON object that I have inside the json files is:

{
 "title":"xyz", 
 "type":"object",
 "properties":{ "id":"123", "name":"xyz"}
}

The new content that should be replaced inside the json files should be:

{
  "name":"xyz"
  "type":"JSON"
  "schema":{ // entire content of the existing JSON should be there in schema
  "title":"xyz", 
  "type":"object",
  "properties":{ "id":"123", "name":"xyz"}
   }
  "owner":"jack"
}

which is simple and efficient to complete this task(shellscript/python script/java code)?

Solution

Taking into account that file is tiny (below 1MB) using next simple Python script is quite enough (will be very fast for such small data).

I've used compression zipfile.ZIP_DEFLATED which is most common for zip files. You may use zipfile.ZIP_STORED instead if you need uncompressed files inside. Or zipfile.ZIP_BZIP2 and zipfile.ZIP_LZMA for other compression algorithms. Compression type is needed to be set only for output/processed zip, input zip compression is derived automatically. zipfile, json and io are standard python modules, no need to install anything.

fin.zip and fout.zip are example input/output zip files names. You need a separate output/processed zip file and can't modify input zip in-place, because JSON files change their sizes and also input zip may be compressed too, hence needs repacking/recompression to another output zip file. Also input and output zip file name may be same then input file will be replaced, but in this case don't forget to backup zips until you sure that zips are transformed without mistakes.

You can see lines where jdata object is modified, you may change them the way you need for your task. jdata is decoded from json then modified and after that encoded back to json. Also note that all json and zip files in whole hierarchy will be modified, if you need to limit that extend condition elif fname.endswith('.json').

Looks like you have nested zip inside another zip. That's why I made a separate ProcessZip() function so that it is called recursivelly to process nested zips, it can process any nesting levels.

Update: I've added example of xml to json conversion. It needs module xmltodict to be installed via command python -m pip install xmltodict. Also as xml may be converted in different ways to json (e.g. json doesn't have attributes) you may also need to fix converted contents the way you need. Also note that after conversion from xml I change zipped file extension from .xml to .json.

import zipfile, json, io
# Needs: python -m pip install xmltodict
import xmltodict

def ProcessZip(file_data):
    res = io.BytesIO()
    with zipfile.ZipFile(io.BytesIO(file_data), mode = 'r') as fin, \
         zipfile.ZipFile(res, mode = 'w', compression = zipfile.ZIP_DEFLATED) as fout:
        for fname in fin.namelist():
            data = fin.read(fname)
            if fname.endswith('.zip'):
                data = ProcessZip(data)
            elif fname.endswith('.json'):
                jdata = json.loads(data.decode('utf-8-sig'))
                # Check that we don't modify already modified file
                assert 'schema' not in jdata, 'JSON file "%s" already modified inside zip!' % fname
                # Modify JSON content here
                jdata = {
                    'name': 'xyz',
                    'type': 'JSON',
                    'schema': jdata,
                    'owner': 'jack',
                }
                data = json.dumps(jdata, indent = 4).encode('utf-8')
            elif fname.endswith('.xml'):
                jdata = xmltodict.parse(data)
                jdata = {
                    'name': 'xyz',
                    'type': 'JSON',
                    'schema': jdata,
                    'owner': 'jack',
                }
                data = json.dumps(jdata, indent = 4).encode('utf-8')
                fname = fname[:fname.rfind('.')] + '.json'
            fout.writestr(fname, data)
    return res.getvalue()

with open('fin.zip', 'rb') as fin:
    res = ProcessZip(fin.read())
with open('fout.zip', 'wb') as fout:
    fout.write(res)