Search code examples
pythonarraysroman-numerals

What is the best way in python to get a denormalized array from this ordered array?


I have this array:

>>> print raw_data
['LEVEL 1',
'SUBJECT A',
'GROUP X',
'COMMENT i',
'COMMENT ii',
'COMMENT iii',
'GROUP Y',
'COMMENT iv',
'COMMENT v',
'COMMENT vi',
'LEVEL 2',
'SUBJECT B',
'GROUP Z',
'COMMENT vii',
'COMMENT viii',
'COMMENT ix',
'SUBJECT C',
'GROUP X2',
'COMMENT x',
'COMMENT xi',
'COMMENT xii',
'COMMENT xiii',
'GROUP Y2',
'COMMENT xiv',
'COMMENT xv',
'COMMENT xvi']

Where the obvious hierarchy is:

  1. Level
    1. Subject
      1. Group
        1. Comments

My objective is to get the array as a denormalized array to be store on a database:

>>> print result
[
    ['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT i'],
    ['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT ii'],
    ['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT iii'],
    ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT iv'],
    ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT v'],
    ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT vi'],
    ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT vi'],
    ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT vii'],
    ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT viii'],
    ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT ix'],
    ['LEVEL 2', 'SUBJECT C', 'GROUP X1', 'COMMENT x'],
    ['LEVEL 2', 'SUBJECT C', 'GROUP X1', 'COMMENT xi'],
    ['LEVEL 2', 'SUBJECT C', 'GROUP X1', 'COMMENT xii'],
    ['LEVEL 2', 'SUBJECT C', 'GROUP X1', 'COMMENT xiii],'
    ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xiv'],
    ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xv'],
    ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xi']
]

I was trying to solve this, but I am quite lost, I think this problem has to be usual, so I would like to know if someone has a efficient approach, this seems to be something like nested sets, but I don't know a lot of this on python, getting the level is easy, but I am getting " headaches" getting this further.

>>> def addlevel(a):
    if a.startswith('LEVEL'):
        return [1, a]
    elif a.startswith('SUBJECT'):
        return [2, a]
    elif a.startswith('GROUP'):
        return [3, a]
    elif a.startswith('COMMENT'):
        return [4, a]
>>> map(addlevel, raw_data)
[[1, 'LEVEL 1'],
 [2, 'SUBJECT A'],
 [3, 'GROUP X'],
 [4, 'COMMENT i'],
 [4, 'COMMENT ii'],
 [4, 'COMMENT iii'],
 [3, 'GROUP Y'],
 [4, 'COMMENT iv'],
 [4, 'COMMENT v'],
 [4, 'COMMENT vi'],
 [1, 'LEVEL 2'],
 [2, 'SUBJECT B'],
 [3, 'GROUP Z'],
 [4, 'COMMENT vii'],
 [4, 'COMMENT viii'],
 [4, 'COMMENT ix'],
 [2, 'SUBJECT C'],
 [3, 'GROUP X2'],
 [4, 'COMMENT x'],
 [4, 'COMMENT xi'],
 [4, 'COMMENT xii'],
 [4, 'COMMENT xiii'],
 [3, 'GROUP Y2'],
 [4, 'COMMENT xiv'],
 [4, 'COMMENT xv'],
 [4, 'COMMENT xvi']]

I would appreciate any clues !


Solution

  • You could try something like this:

    raw_data = [ 'LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT i', 'COMMENT ii',
    'COMMENT iii', 'GROUP Y', 'COMMENT iv', 'COMMENT v', 'COMMENT vi', 'LEVEL 2',
    'SUBJECT B', 'GROUP Z', 'COMMENT vii', 'COMMENT viii', 'COMMENT ix', 
    'SUBJECT C', 'GROUP X2', 'COMMENT x', 'COMMENT xi', 'COMMENT xii', 
    'COMMENT xiii', 'GROUP Y2', 'COMMENT xiv', 'COMMENT xv', 'COMMENT xvi' ]
    
    level, subject, group, comment = '', '', '', ''
    
    result = []
    
    for item in raw_data:
    
        if item.startswith('COMMENT'): 
            comment = item
        elif item.startswith('GROUP'): 
            group = item
            comment = ''
        elif item.startswith('SUBJECT'): 
            subject = item
            group = ''
        elif item.startswith('LEVEL'): 
            level = item
            subject = ''
    
        if level and subject and group and comment:
            result.append([level, subject, group, comment])
    
    import pprint
    pprint.pprint(result)
    

    Which would yield:

    [['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT i'],
     ['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT ii'],
     ['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT iii'],
     ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT iv'],
     ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT v'],
     ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT vi'],
     ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT vii'],
     ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT viii'],
     ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT ix'],
     ['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT x'],
     ['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT xi'],
     ['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT xii'],
     ['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT xiii'],
     ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xiv'],
     ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xv'],
     ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xvi']]