Search code examples
pythondata-processing

Combining values from 2 rows using python


I have a 2D array data that looks like this in excel :

01-Jan-1990 0:00:01 A abcdefghi jklmnopq
01-Jan-1990 0:00:02 A abcdefghi xxxyyyvvv fefwwe qqqqq
01-Jan-1990 0:00:02   xwwwww xxxxxxx yyyy
01-Jan-1990 0:00:05 B qwerty qwerty
01-Jan 1990 0:00:06 C popopop qwqwqwq tytytyty sss

and in data,

[['01-Jan-1990 0:00:01','A','abcdefghi jklmnopq'],
 ['01-Jan-1990 0:00:02','A','abcdefghi xxxyyyvvv fefwwe qqqqq'],
 ['01-Jan-1990 0:00:02','\t','xwwwww xxxxxxx yyyy'],      #\t instead of empty
 ['01-Jan-1990 0:00:05','B','qwerty qwerty'],
 ['01-Jan 1990 0:00:06','C','popopop qwqwqwq tytytyty sss']]

Would wanna ask if there is any ways i can combine the value from row 3 into the string of row 2 that do not have the A,B,C?

Desired output as followed:

01-Jan-1990 0:00:01 A abcdefghi jklmnopq
01-Jan-1990 0:00:02 A abcdefghi xxxyyyvvv fefwwe qqqqq xwwwww xxxxxxx yyyy
01-Jan-1990 0:00:05 B qwerty qwerty
01-Jan 1990 0:00:06 C popopop qwqwqwq tytytyty sss

EDITED WITH DESIRED OUTPUT and missing field


Solution

  • Here is a digestible solution with comments :)

    EMPTY_MARKER = '\t'
    LABEL_INDEX, VALUES_INDEX = 1, 2
    
    # first pass to append rows with empty labels to previous row
    for row in range(len(data) - 1):
        if data[row + 1][LABEL_INDEX] == EMPTY_MARKER:
            data[row][VALUES_INDEX] += ' {}'.format(data[row + 1][VALUES_INDEX])
    
    # second pass to remove empty label rows
    for row in data:
        if row[LABEL_INDEX] == EMPTY_MARKER:
            data.remove(row)
    
    # visually readable way to display output :)
    for row in data:
        print row
    

    output:

    ['01-Jan-1990 0:00:01', 'A', 'abcdefghi jklmnopq']
    ['01-Jan-1990 0:00:02', 'A', 'abcdefghi xxxyyyvvv fefwwe qqqqq xwwwww xxxxxxx yyyy']
    ['01-Jan-1990 0:00:05', 'B', 'qwerty qwerty']
    ['01-Jan 1990 0:00:06', 'C', 'popopop qwqwqwq tytytyty sss']