Search code examples
pythondjangoperformancenested-loops

More effective loop in Python


I have situation where I need to loop over two lists of objects and find equal and then loop over their fields and change some attributes. Looks it like this

for new_product in products_and_articles['products']:
  for old_product in products_for_update:
    if new_product.article == old_product.article:
      for old_field in old_product._meta.get_all_field_names():
        for new_field in new_product._meta.get_all_field_names():
          if old_field == new_field and old_field != 'id' and old_field != 'slug':
            setattr(old_product, old_field, getattr(new_product, old_field))

Obviously this is far from being good or even acceptable. So I'm seeking for advice how can I avoid so much looping and enhance the algorythm


Solution

  • It helps if you break the process into logical, reusable parts.

    for new_product in products_and_articles['products']:
      for old_product in products_for_update:
        if new_product.article == old_product.article:
          …
    

    For example, here what you are doing is find a product that matches a particular article. Since article is unique, we could write something like this:

    def find_products_by_article(products, article):
      '''Find all products that match the given article.  Returns
      either a product or 'None' if it doesn't exist.'''
      for products in products:
        return product
    

    Then call it with:

    for old_product in products_for_update:
      new_products = find_products_by_article(
                       products_and_articles['products'],
                       old_product.article)
      …
    

    But this could be much more efficient if we could take advantage of a data structure that is optimized for look-ups, namely a dict (constant instead of linear complexity). So what we could do instead is:

    # build a dictionary that stores products indexed by article
    products_by_article = dict(product.article, product for product in
                               products_and_articles['products'])
    
    for old_product in products_for_update:
      try:
        # look up product in the dictionary
        new_product = products_by_article[old_product.article]
      except KeyError:
        # silently ignore products that don't exist
        continue
      …
    

    If you do such lookups frequently, it would be better to reuse the products_by_article dictionary elsewhere as well instead of building one from scratch every time. Be careful though: if you use multiple representations of the product records, you need make they always stay in sync!

    For the inner loops, notice that new_field here only serves as a check for whether a field exists:

    …
      for old_field in old_product._meta.get_all_field_names():
        for new_field in new_product._meta.get_all_field_names():
          if old_field == new_field and old_field != 'id' and old_field != 'slug':
            setattr(old_product, old_field, getattr(new_product, old_field))
    

    (Note that this is slightly suspicious: any new fields that don't already exist in the old_product are silently discarded: is this intentional?)

    This can be repackaged as follows:

    def transfer_fields(old, new, exclusions=('id', 'slug')):
      '''Update all pre-existing fields in the old record to have
      the same values as the new record.  The 'exclusions' parameter
      can be used to exclude certain fields from being updated.'''
      # use a set here for efficiency reasons
      fields = frozenset(old._meta.get_all_field_names())
      fields.difference_update(new._meta.get_all_field_names())
      fields.difference_update(exclusions)
      for field in fields:
        setattr(old, field, getattr(new, field))
    

    Putting all this together:

    # dictionary of products indexed by article
    products_by_article = dict(product.article, product for product in
                               products_and_articles['products'])
    
    for old_product in products_for_update:
      try:
        new_product = products_by_article[old_product.article]
      except KeyError:
        continue          # ignore non-existent products
      transfer_fields(old_product, new_product)
    

    This final code has a time complexity of O(n × k), where n is the number of products and k is the number of fields.