Search code examples
pythonbashshellsequencels

Bash alias to automatically detect arbitrarily named file sequences?


I'm looking to make a bash alias that will change the results of ls. I am constantly dealing with large sequences of files, that do not follow the same naming conventions. The only common thing about them is that the number is 4 padded (sorry not really sure of correct way to say that) and immediately precedes the extension.

eg - filename_v028_0392.bgeo, test_x34.prerun.0012.simdata, filename_v001_0233.exr

I would like for the sequences to be listed each as 1 element, so that

filename_v003_0001.geo
filename_v003_0002.geo
filename_v003_0003.geo
filename_v003_0004.geo
filename_v003_0005.geo
filename_v003_0006.geo
filename_v003_0007.geo
filename_v003_0032.geo
filename_v003_0033.geo
filename_v003_0034.geo
filename_v003_0035.geo
filename_v003_0036.geo
testxxtest.0057.exr
testxxtest.0058.exr
testxxtest.0059.exr
testxxtest.0060.exr
testxxtest.0061.exr
testxxtest.0062.exr
testxxtest.0063.exr

would be displayed as somethign along the lines of

[seq]filename_v003_####.geo (1-7)
[seq]filename_v003_####.geo (32-36)
[seq]testxxtest.####.exr (57-63)

while still listing non sequences unaltered.

I'm really not sure where to start approaching this. I know a decent amount of python, but not sure if that would really be the best way to go about it. Any help would be greatly appreciated!

Thanks


Solution

  • I got a python 2.7 script that solves your problem by solving the more general problem of collapsing several lines changing only by a sequence number

    import re
    
    def do_compress(old_ints, ints):
        """
        whether the ints of the current entry is the continuation of the previous
        entry
        returns a list of the indexes to compress, or [] or False when the current
        line is not part of an indexed sequence
        """
        return len(old_ints) == len(ints) and \
            [i for o, n, i in zip(old_ints, ints, xrange(len(ints))) if n - o == 1]
    
    def basic_format(file_start, file_stop):
        return "[seq]{} .. {}".format(file_start, file_stop)
    
    
    def compress(files, do_compress=do_compress, seq_format=basic_format):
        p = None
        old_ints = ()
        old_indexes = ()
    
        seq_and_files_list = [] 
            # list of file names or dictionaries that represent sequences:
            #   {start, stop, start_f, stop_f}
    
        for f in files:
            ints = ()
            indexes = ()
    
            m = p is not None and p.match(f) # False, None, or a valid match
            if m:
                ints = [int(x) for x in m.groups()]
                indexes = do_compress(old_ints, ints)
    
            # state variations
            if not indexes: # end of sequence or no current sequence
                p = re.compile( \
                    '(\d+)'.join(re.escape(x) for x in re.split('\d+',f)) + '$')
                m = p.match(f)
                old_ints = [int(x) for x in m.groups()]
                old_indexes = ()
                seq_and_files_list.append(f)
    
            elif indexes == old_indexes: # the sequence continues
                seq_and_files_list[-1]['stop'] = old_ints = ints
                seq_and_files_list[-1]['stop_f'] = f
                old_indexes = indexes
    
            elif old_indexes == (): # sequence started on previous filename
                start_f = seq_and_files_list.pop()
                s = {'start': old_ints, 'stop': ints, \
                    'start_f': start_f, 'stop_f': f}
                seq_and_files_list.append(s)
    
                old_ints = ints
                old_indexes = indexes
    
            else: # end of sequence, but still matches previous pattern
                old_ints = ints
                old_indexes = ()
                seq_and_files_list.append(f)
    
        return [ isinstance(f, dict) and seq_format(f['start_f'], f['stop_f']) or f 
            for f in seq_and_files_list ]
    
    
    if __name__ == "__main__":
        import sys
        if len(sys.argv) == 1:
            import os
            lst = sorted(os.listdir('.'))
        elif sys.argv[1] in ("-h", "--help"):
            print """USAGE: {} [FILE ...]
    compress the listing of the current directory, or the content of the files by
    collapsing identical lines, except for a sequence number
    """
            sys.exit(0)
        else:
            import string
            lst = [string.rstrip(l, '\r\n') for f in sys.argv[1:] for l in open(f)])
        for x in compress(lst):
            print x
    

    That is, on your data:

    bernard $ ./ls_sequence_compression.py given_data
    [seq]filename_v003_0001.geo .. filename_v003_0007.geo
    [seq]filename_v003_0032.geo .. filename_v003_0036.geo
    [seq]testxxtest.0057.exr .. testxxtest.0063.exr
    

    It bases itself on the differences between the integers present in two consecutive lines that match on the non-digit text. This allows to deal with non-uniform input, on changes of the field used as basis for the sequence...

    Here is an example of input:

    01 - test8.txt
    01 - test9.txt
    01 - test10.txt
    02 - test11.txt
    02 - test12.txt
    03 - test13.txt
    04 - test13.txt
    05 - test13.txt
    06
    07
    08
    09
    10
    

    which gives:

    [seq]01 - test8.txt .. 01 - test10.txt
    [seq]02 - test11.txt .. 02 - test12.txt
    [seq]03 - test13.txt .. 05 - test13.txt
    [seq]06 .. 10
    

    Any comment is welcome!

    Hah... I nearby forgot: without arguments, this script outputs the collapsed contents of the current directory.