Storing file information

I'm currently working on a backup solution, once the files are synchronized I'd like to make also a list of the files/directories ( their permissions/uid/gid ) list and save it somewhere.

Currently I have a backup "snapshot" that has 4105 files and 574 directories, I'm using python to walk through the backup snapshot and take this information and it works quite well, but here is a catch.

First I wanted to get all that info and write it into a single file, in the end it was 170MB in size. Not so good.

Then I decided to split the info per directory basis and write it down, I ended up with 106MB total of disk usage.

The script os.walks() and saves the directory info in one list, then it does the same for the files. Two lists are combined in one dictionary which is then JSON encoded and written to the disk in a small file, depending on the size.

I was wondering if you have a recommendation how to possibly size down the disk usage ?

I haven't tried SQLite as a storage engine, where this info would be written down, nor with MySQL, probably would end up with few GB database size.

Thank you on your recommendations and assistance, the code is just to get a feel what I'm using.

This is the script I'm using that does the job:

import os, sys
import json

zdir = {}
filestat=[]
dirstat=[]
for path, dirs, files in os.walk("/backup/us-s01",  followlinks=None):
        try:
                # Store files in the directory
                for file in files:
                        #print  os.path.join(path, file) 
                        st = os.stat( os.path.join( path, file ) )
                        file_stat = {
                                                        'name': file, 
                                                        'perm': oct( st.st_mode )[-4::],
                                                        'uid': st.st_uid,
                                                        'gid': st.st_gid,
                                                        'size': st.st_size
                                                        }
                        filestat.append( file_stat )

                # Store directory in 
                for di in dirs:
                        std = os.stat( os.path.join( path, di ) )
                        di_stat = { 
                                                'name': di,
                                                'perm': oct(std.st_mode)[-4::],
                                                'uid': std.st_uid,
                                                'gid': std.st_gid,
                                                'size': std.st_size
                                        }
                        dirstat.append( di_stat  )
                pa = path.replace('/', '-')
                zdir = { 'files':filestat, 'dirs':dirstat}
                f = open( '/root/test/json'+pa+'dat', 'w')
                f.write( json.dumps(zdir, separators=(',',':')) )
                f.close()

        except OSError:
                pass

Solution

You can just use gunzip for the output:

import gzip

# your code as posted

zdir = { 'files':filestat, 'dirs':dirstat}
string_out = json.dumps(zdir, separators=(',',':'))    
f = gzip.open( '/root/test/json'+pa+'gz', 'wb')
f.write(string_out)

I've made a test with this and found it to compress the output to 10% of the diskusage it has compared to writing the string into a textfile.