I have a list courses_per_semester that looks like this:
[[['CS105', 'ENG101', 'MATH101', 'GER'], ['ENG102', 'CS230', 'MATH120', 'GER'], ['CS205', 'FREE'], ['GER'], ['CS106', 'CS215', 'CS107', 'ENG204'], ['GER', 'MATH220', 'CS300', 'CS206'], ['CS306', 'GER'], ['FREE'], ['CS312', 'CS450', 'GER', 'CS321', 'FREE'], ['CS325', 'GER', 'CS322', 'MAJOR'], ['CS310', 'STAT205'], [''], ['CS443', 'CS412', 'CS421', 'GER'], ['CS444', 'FREE', 'FREE', ''], ['', '']], [['CS105', 'ENG101', 'MATH101', 'GER'], ['ENG102', 'CS230', 'MATH120', 'GER'], ['CS205', 'FREE'], ['GER'], ['CS106', 'CS215', 'CS107', 'ENG204'], ['GER', 'MATH220', 'CS300', 'CS206'], ['CS306', 'GER'], ['FREE'], ['CS312', 'CS450', 'GER', 'CS321', 'FREE'], ['CS325', 'GER', 'CS322', 'MAJOR'], ['CS310', 'STAT205'], [''], ['CS443', 'CS412', 'CS421', 'GER'], ['CS444', 'FREE', 'FREE', ''], ['', '']], [['CS105', 'ENG101', 'MATH101', 'GER'], ['ENG102', 'CS230', 'MATH120', 'GER'], ['CS205', 'FREE'], ['GER'], ['CS106', 'CS215', 'CS107', 'ENG204'], ['GER', 'MATH220', 'CS300', 'CS206'], ['CS306', 'GER'], ['FREE'], ['CS312', 'CS450', 'GER', 'CS321', 'FREE'], ['CS325', 'GER', 'CS322', 'MAJOR'], ['CS310', 'STAT205'], [''], ['CS443', 'CS412', 'CS421', 'GER'], ['CS444', 'FREE', 'FREE', ''], ['', '']], [['CS105', 'ENG101', 'MATH101', 'GER'], ['ENG102', 'CS230', 'MATH120', 'GER'], ['CS205', 'FREE'], ['GER'], ['CS106', 'CS215', 'CS107', 'ENG204'], ['GER', 'MATH220', 'CS300', 'CS206'], ['CS306', 'GER'], ['FREE'], ['CS312', 'CS450', 'GER', 'CS321', 'FREE'], ['CS325', 'GER', 'CS322', 'MAJOR'], ['CS310', 'STAT205'], [''], ['CS443', 'CS412', 'CS421', 'GER'], ['CS444', 'FREE', 'FREE', ''], ['', '']], [['CS105', 'ENG101', 'GER', 'GER'], ['ENG102', 'CS230', 'MATH120', 'GER'], ['CS205', 'FREE'], ['GER'], ['CS106', 'CS215', 'CS107', 'ENG204'], ['GER', 'MATH220', 'CS300', 'CS206'], ['CS306', 'GER'], ['FREE'], ['CS312', 'CS450', 'GER', 'CS321', 'FREE'], ['CS325', 'GER', 'CS322', 'MAJOR'], ['CS310', 'STAT205'], [''], ['CS443', 'CS412', 'CS421', 'GER'], ['CS444', 'FREE', 'FREE', ''], ['', '']],...]
So each list is a course path till graduation that a student has taken and each sublist is the combination of courses each student has taken. I have 1500 students and I want to create a new list with all the unique combinations of each sublist. To be more precise I want to check if for example courses_per_semester[0][0] is same with courses_per_semester[1][0],courses_per_semester[2][0] ,...,courses_per_semester[1500][0]. Then do the same for second sublist of each list. Each time the code finds a unique sublist I want the combination to be put in a new list, for example the first_sublist_combinations. But the most important is that if one student has this combination ['CS105', 'ENG101', 'MATH101', 'GER'] and another student this one ['CS105', 'MATH101', 'GER','ENG101'] then I want the code to consider them the same. Not to place them as different. So I do not care for the order. So the first_sublist_combination will take ['CS105', 'ENG101', 'MATH101', 'GER'] only once and not ['CS105', 'ENG101', 'MATH101', 'GER'] AND ['CS105', 'MATH101', 'GER','ENG101']
I did not find a way to do so. I tried to do sets but sets only accept unique values and at some lists I have more than 1 empty items and it takes one the 1 which I cannot accept.
What I tried to do is the following:
for i in range(0,len(courses_per_semester)-1):
for j in range(i,len(courses_per_semester[i])):
if courses_per_semester[i][j]==courses_per_semester[i+1][j]:
first_sublist_combinations.append(courses_per_semester[i][j])
but it does not work because probably I am not thinking about it in the correct way. I also transformed the list into a set
course_sets_per_semester = [[set(courses_per_semester) for courses_per_semester in sublist] for sublist in courses_per_semester]
but like this it only gives me string once even if it is twice in a sublist so I cannot compare them correctly even if I wanted to because the length of one sublist becomes smaller than another one when they are supposed to be same.
For example the first_sublist_combinations =[['CS105', 'MATH101', 'GER','ENG101'],['CS105','MATH101','GER','GER'],..]
Assuming x
is your input list:
res=list(map(lambda c: list(set(c)), zip(*map(lambda a: list(map(lambda b: tuple(sorted(b)), a)), x))))
which as per your example outputs:
[[('CS105', 'ENG101', 'GER', 'MATH101')], [('CS230', 'ENG102', 'GER', 'MATH120')], [('CS205', 'FREE')], [('GER',)], [('CS106', 'CS107', 'CS215', 'ENG204')], [('CS206', 'CS300', 'GER', 'MATH220')], [('CS306', 'GER')], [('FREE',)], [('CS312', 'CS321', 'CS450', 'FREE', 'GER')], [('CS322', 'CS325', 'GER', 'MAJOR')], [('CS310', 'STAT205')], [('',)], [('CS412', 'CS421', 'CS443', 'GER')], [('', 'CS444', 'FREE', 'FREE')], [('', '')]]