Search code examples
pythonindicesunique-values

Collecting all the indices of unique elements in CSV file and populating them in a row


I have a set of data in CSV file like this:

[['1', '1.5', '1', '2', '1.5', '2'],
 ['2', '2.5', '3', '2.5', '3', '2.5'],
 ['3', '2.5', '1.5', '1', '1', '3'],
 ['1.5', '1', '2', '2', '2', '2.5'],
 ['1.5', '1.5', '1', '2.5', '1', '3']]

I want to find the all the unique entries in this data listed in ascending order. I have tried this code:

import csv
import numpy 

    dim1=[]                                                                        
    with open('D:/TABLE/unique_values.csv') as f1:
        for rows in f1.readlines():
            dim1.append(rows.strip().split(','))    
            
            
    uniqueValues = numpy.unique(dim1)
    print('Unique Values : ',uniqueValues)

and it gives me this output :

Unique Values :  ['1' '1.5' '2' '2.5' '3']

I want to list these unique entries in the column in CSV file and want to write their running indices in a row against each unique entry. A sample output which is desired is shown below.

Sample Output Sample Output

I have tried other numpy functions but they only return the first occurrence of unique entry. Also, I have seen other relevant posts but they do not populate the running indices of each unique element in a row.


Solution

  • This would be fairly straight forward with some functions from the standard library: collections.defaultdict. csv.reader, and itertools.count. Something like:

    import csv
    import collections 
    import itertools
    
    data = collections.defaultdict(list)                                                                        
    
    index = itertools.count(1)
    with open('D:/TABLE/unique_values.csv') as f1:
        reader = csv.reader(f1)
    
        for row in reader:
            for value in row:
                data[value].append(next(index))    
                
    for unique_value, indices in data.items():
        print(f"{unique_value}:", *indices)