Search code examples
listpython-2.7duplicatesiterationpyramid

Looping through a list of object/values, remove duplicates, and return unique value in View (python)


I am trying to better understand and put into action some programming concepts, specifically looping and recursion (or iteration??) through a list that contains values. I have an api that retrieves a list from the database and prints a table entity and its elements (e.g. [<Assessment(name='teaching', text='something', microseries=3, subseries='3a', etc...)>, <Assessment(name='learning', text='foo', microseries=3, subseries='3b', etc...)>]). There are 1-5 microseries in this list.

Problem: remove duplicate microseries so the output (return) is only a single microseries (e.g. only one microseries= 3 and not all the microseries under 3 which is specified under the name subseries: 3a, 3b, 3c, 3d). The goal here is to display a single microseries (it's only one so don't let the plural name here confuse you) in a template so that a user can click on it and go into an expanded microseries (subseries) view (3a, 3b, 3c ....).

Table arrangement:
Assessment
|--name
|--text
|--microseries
|--subseries

I am sure there might be a better approach to this and I am all ears to some recommendations. I am a newbie and could use some wise direction on how to tackle this problem.

I am currently using code suggested on former Stacks questions 1, 2 Please help :) I would like to better understand the concepts behind the approach in dealing with iteration over a list and removing duplicates and returning only a unique value per object in a list. Please excuse my lack of tech speak.

I am using Python 2.7, SQLAlchemy and Pyramid (for the web framework)


view.py (view front-end code)

@view_config(route_name='assessments', request_method='GET', renderer='templates/unique_assessments.jinja2')
def view_unique_assessments(request):
    # other code
    all_assessments = api.retrieve_assessments()
    #print 'all assessments', all_assessments

    new_list = list(set(all_assessments)) #removes duplicates
    new_list.sort() #sorts items in list
    print 'new list', new_list # sorted and unique list

    for x in new_list:
        print 'test', x #prints test <Assessment(name='Foo', text='this is 1A', microseries='1', etc...)>    

        micro = set([x.microseries]) #doesn't work 
        print 'test micro single print', micro #doesn't iterate over the list and print out each unique microseries -- only prints one
        #prints: test micro single print set([3]) instead of 1,2,3,4,5

    return {'logged_in': logged_in_userid, 'unique_microseries': micro}

Database Table:

class Assessment(Base):
    __tablename__ = 'assessments'

    id = Column(Integer, primary_key=True)
    name = Column(String(50), unique=True)
    text = Column(String(2000))
    microseries = Column(Integer)
    subseries = Column(String(50))
    created_on = Column(DateTime, default=datetime.utcnow)

API:

def retrieve_assessments(self):
    assessments = self.session.query(Assessment).order_by(Assessment.id).all()
    return assessments

Solution

  • The usual approach, as @Mikko suggests, is to use a dict (or sometimes a set) to keep track of which items you already saw during the iteration - if the item is already in the dict you just skip and go to the next one. Then you use .values() method to get the, err, values of the dict.

    def view_unique_assessments(request):
        all_assessments = api.retrieve_assessments()
        assessments_by_microseries = {}
    
        for x in all_assessments:     
            if x.microseries in assessments_by_microseries:
                print("Already seen this microseries: %s" % x.microseries)
            else: 
                assessments_by_miniseries[x.microseries] = x
    
        unique_assessments = sorted(assessments_by_microseries.values())     
        return {'logged_in': logged_in_userid, 'unique_assessments': unique_assessments}