So I have a txt file with many lines that look like this:
2107|Business|2117|Art|2137|Art|2145|English
Essentially it is a random students major and an encoded semester and year that they declared it before it. What I want to be able to do is read in the semester each unique major was declared initially. From the line above I would need:
2107:Business
2117: Art
2145: English
I was attempting to do this with Pandas in Python but really can't get anything to work. Any help appreciated?
EDIT: Should have clarified. I don't want the code to read in the second instance of Art. Only the first declaration and semester before for each major.
Use Python's CSV library to help with splitting each of the rows into a list of cells. You can then make use of Python's grouper()
recipe which is used to take n
items at a time out of a list:
import csv
import itertools
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return itertools.izip_longest(fillvalue=fillvalue, *args)
seen = set()
with open('input3.txt', 'rb') as f_input:
for row in csv.reader(f_input, delimiter='|'):
for k, v in grouper(row, 2):
if v not in seen:
print "{}: {}".format(k, v)
seen.add(v)
So for your example file line, this would give you:
2107: Business
2117: Art
2145: English