Search code examples
pythonpython-3.xlistgroupingordereddict

Python group values in 2d list from CSV


I have the following CSV

BBCP1,Grey,2140,805EC0FFFFE2,0000000066
BBCP1,Test,2150,805EC0FFFFE2,0000000066
BBCP1,Test,2151,805EC0FFFFE1,0000000066
BBCP1,Centre,2141,805EC0FFFFE3,000000077
BBCP1,Yellow,2142,805EC0FFFFE3,000000077
BBCP1,Purple,2143,805EC0FFFFE3,000000077
BBCP1,Green,2144,805EC0FFFFE3,000000077
BBCP1,Pink,2145,805EC0FFFFE3,000000077

I'm reading this data in using

data = list(csv.reader(open(csvFile)))

I want to turn this data into a 2d array or equivilent and group by the value in the 4th column (the MAC address), preserving the order they were in in the original list. So it would look like

[(BBCP1,Grey,2140,805EC0FFFFE2,0000000066),(BBCP1,Test,2150,805EC0FFFFE2,0000000066)],
[(BBCP1,Test,2151,805EC0FFFFE1,0000000066)],
[(BBCP1,Centre,2141,805EC0FFFFE3,000000077),
(BBCP1,Yellow,2142,805EC0FFFFE3,000000077),
(BBCP1,Purple,2143,805EC0FFFFE3,000000077),
(BBCP1,Green,2144,805EC0FFFFE3,000000077),
(BBCP1,Pink,2145,805EC0FFFFE3,000000077)]

Hopefully i've displayed the array correctly and it makes sense.

I then need to loop the arrays to output the data to file. Which i'm pretty sure i'm ok with a nested for loop.

Thanks in advance for any help


Solution

  • use defaultdict to group the data (groupby would require sorting and would be unefficient / would kill the order), then print the sorted dictionary values (sorting isn't really necessary, it's just to stabilize the output):

    import csv,collections
    
    d = collections.defaultdict(list)
    
    for row in csv.reader(txt):
        mac_address = row[3]
        d[mac_address].append(row)
    
    print(sorted(d.values()))
    

    resulting in:

    [[['BBCP1', 'Centre', '2141', '805EC0FFFFE3', '000000077'],
      ['BBCP1', 'Yellow', '2142', '805EC0FFFFE3', '000000077'],
      ['BBCP1', 'Purple', '2143', '805EC0FFFFE3', '000000077'],
      ['BBCP1', 'Green', '2144', '805EC0FFFFE3', '000000077'],
      ['BBCP1', 'Pink', '2145', '805EC0FFFFE3', '000000077']],
     [['BBCP1', 'Grey', '2140', '805EC0FFFFE2', '0000000066'],
      ['BBCP1', 'Test', '2150', '805EC0FFFFE2', '0000000066']],
     [['BBCP1', 'Test', '2151', '805EC0FFFFE1', '0000000066']]]
    

    sorting according to key (the mac address):

    values = [v for _,v in sorted(d.items())]
    

    yields:

    [[['BBCP1', 'Test', '2151', '805EC0FFFFE1', '0000000066']],
     [['BBCP1', 'Grey', '2140', '805EC0FFFFE2', '0000000066'],
      ['BBCP1', 'Test', '2150', '805EC0FFFFE2', '0000000066']],
     [['BBCP1', 'Centre', '2141', '805EC0FFFFE3', '000000077'],
      ['BBCP1', 'Yellow', '2142', '805EC0FFFFE3', '000000077'],
      ['BBCP1', 'Purple', '2143', '805EC0FFFFE3', '000000077'],
      ['BBCP1', 'Green', '2144', '805EC0FFFFE3', '000000077'],
      ['BBCP1', 'Pink', '2145', '805EC0FFFFE3', '000000077']]]