Search code examples
pythonpandasnumpytext-filesunique

Txt file python unique values


So I have a txt file with many lines that look like this:

2107|Business|2117|Art|2137|Art|2145|English

Essentially it is a random students major and an encoded semester and year that they declared it before it. What I want to be able to do is read in the semester each unique major was declared initially. From the line above I would need:

2107:Business

2117: Art

2145: English

I was attempting to do this with Pandas in Python but really can't get anything to work. Any help appreciated?

EDIT: Should have clarified. I don't want the code to read in the second instance of Art. Only the first declaration and semester before for each major.


Solution

  • Use Python's CSV library to help with splitting each of the rows into a list of cells. You can then make use of Python's grouper() recipe which is used to take n items at a time out of a list:

    import csv
    import itertools
    
    def grouper(iterable, n, fillvalue=None):
        "Collect data into fixed-length chunks or blocks"
        # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
        args = [iter(iterable)] * n
        return itertools.izip_longest(fillvalue=fillvalue, *args)
    
    seen = set()
    
    with open('input3.txt', 'rb') as f_input:
        for row in csv.reader(f_input, delimiter='|'):
            for k, v in grouper(row, 2):
                if v not in seen:
                    print "{}: {}".format(k, v)
                    seen.add(v)
    

    So for your example file line, this would give you:

    2107: Business
    2117: Art
    2145: English