Search code examples
pythonpython-3.xoptimizationmemorymemory-management

How can I sort a dictionary by value without creating even a temporary copy


I have a dictionary, which I need to sort by key. I physically do not have enough memory to do it.

x = dict(sorted(x.items(), key=lambda item: item[1])

Is there a way to sort it without creating even a temporary spike in memory usage. I though I could maybe use pop(), to remove items from the original, to keep the same amount of data in memory? But I don't know if there is a simpler way to do it.

My dictionary is about 10^8 objects, which is taking up about 100 Gb. I have about 20-25 Gb of memory free.


Solution

  • Not no extra space, but much less, so maybe still useful for you or others with the same problem. Memory measurements (peak bytes) with 10^5 items:

    12,059,636 baseline
    24,408,256 original
    19,065,304 better 1
    18,709,352 better 2
    14,265,296 sort keys, sort vals
    12,949,792 sort keys, sort vals 2
    

    baseline is for creating the original dict. Your original solution peaks at additional 12.3 MB. My best alternative peaks at additional 0.9 MB.

    Code (Try it online!):

    import tracemalloc as tm
    from random import random
    import gc
    
    n = 10**5
    
    def start(label):
      global x, label_
      label_ = label
      gc.collect()
      tm.start()
      x = {random(): random() for _ in range(n)}
    
    def stop():
      global x
      print(f'{tm.get_traced_memory()[1]:10,}', label_)
      tm.stop()
      if label_ != 'baseline':
        assert len(x) == n
        assert list(x.values()) == sorted(x.values()), list(x.values())
      del x
      gc.collect()
    
    for _ in range(2):
    
      start('baseline')
      stop()
    
      start('original')
      x = dict(sorted(x.items(), key=lambda item: item[1]))
      stop()
    
      start('better 1')
      x = list(x.items())
      x.sort(key=lambda item: item[1])
      x = dict(x)
      stop()
    
      start('better 2')
      ks = list(x)
      ks.sort(key=x.get)
      x = dict(zip(ks, map(x.pop, ks)))
      stop()
    
      start('sort keys, sort vals')
      keys = list(x)
      keys.sort(key=x.get)
      vals = list(x.values())
      del x
      vals.sort()
      x = dict(zip(keys, vals))
      stop()
    
      start('sort keys, sort vals 2')
      keys = list(x)
      keys.sort(key=x.get, reverse=True)
      vals = list(x.values())
      del x
      vals.sort(reverse=True)
      x = {}
      while keys:
        x[keys.pop()] = vals.pop()
      stop()
    
      print()