I have a dictionary, which I need to sort by key. I physically do not have enough memory to do it.
x = dict(sorted(x.items(), key=lambda item: item[1])
Is there a way to sort it without creating even a temporary spike in memory usage. I though I could maybe use pop()
, to remove items from the original, to keep the same amount of data in memory? But I don't know if there is a simpler way to do it.
My dictionary is about 10^8 objects, which is taking up about 100 Gb. I have about 20-25 Gb of memory free.
Not no extra space, but much less, so maybe still useful for you or others with the same problem. Memory measurements (peak bytes) with 10^5 items:
12,059,636 baseline
24,408,256 original
19,065,304 better 1
18,709,352 better 2
14,265,296 sort keys, sort vals
12,949,792 sort keys, sort vals 2
baseline
is for creating the original dict. Your original
solution peaks at additional 12.3 MB. My best alternative peaks at additional 0.9 MB.
Code (Try it online!):
import tracemalloc as tm
from random import random
import gc
n = 10**5
def start(label):
global x, label_
label_ = label
gc.collect()
tm.start()
x = {random(): random() for _ in range(n)}
def stop():
global x
print(f'{tm.get_traced_memory()[1]:10,}', label_)
tm.stop()
if label_ != 'baseline':
assert len(x) == n
assert list(x.values()) == sorted(x.values()), list(x.values())
del x
gc.collect()
for _ in range(2):
start('baseline')
stop()
start('original')
x = dict(sorted(x.items(), key=lambda item: item[1]))
stop()
start('better 1')
x = list(x.items())
x.sort(key=lambda item: item[1])
x = dict(x)
stop()
start('better 2')
ks = list(x)
ks.sort(key=x.get)
x = dict(zip(ks, map(x.pop, ks)))
stop()
start('sort keys, sort vals')
keys = list(x)
keys.sort(key=x.get)
vals = list(x.values())
del x
vals.sort()
x = dict(zip(keys, vals))
stop()
start('sort keys, sort vals 2')
keys = list(x)
keys.sort(key=x.get, reverse=True)
vals = list(x.values())
del x
vals.sort(reverse=True)
x = {}
while keys:
x[keys.pop()] = vals.pop()
stop()
print()