I'm trying to find out if there are faster ways to sort and modify lists in python, other than the native "sorted" function.
Specific details: I have multiple objects from a custom class that I want to sort based on their Z attribute, which is an integer value. Ideally the objects are stored in a dictionary, but a list works as well if it is more performant. They will get updated 60 times per second. And at any frame any Z attribute of any object might change and the list needs to get sorted accordingly. There might be objects added or removed from time to time, but the optimization is oriented especially towards sorting the list/dictionary after an object's Z value changes.
I could also update the list one object or one Z value change at a time and only change that object's position in the sorting, if there is a fast way to do this.
I enquired if heaps would be a good solution for this but they are not. In that linked post I got suggestions for B-trees, AVL trees, red-black trees, skip lists and Python Sorted Containers, but I'm not sure if they are helpful in my specific use case.
I tried looping a single object through a list until it reaches the correct Z position, but that method is also significantly slower than sorting the entire list using native sorted() command. I pasted the sample code for that test below.
Does anyone know if there are faster more efficient ways of sorting for my specific use case? If so, what would a sample code of a basic implementation be?
Sample code (this code is merely for performance testing and is not meant to be fully optimized):
import random
from statistics import mean
import time
class Object:
def __init__(self):
self.Z = random.randint(0,1000000)
def __repr__(self):
return str(self.Z)
def __str__(self):
return str(self.Z)
# Creating a sorted list first:
list_base = [Object() for i in range(10000)]
sorted_list = sorted(list_base, key = lambda name: name.Z)
# New object that will be added to the beginning of the list with a higher Z value than the others:
new_obj = Object()
new_obj.Z = 1000001
# Store the timing results in lists for performance testing:
results_native = []
results_custom = []
# Run performance test multiple time:
for i in range(200):
# inserting new object at the beginning of the list:
# current index:
index = 0
sorted_list.insert(index, new_obj)
# Sort method native:
start_time = time.time()
sorted_list = sorted(sorted_list, key = lambda name: name.Z)
finish_time_1 = time.time() - start_time
results_native.append(finish_time_1)
# Inserting the object back to the beginning of the list to repeat the test:
sorted_list.insert(0, sorted_list.pop(-1))
# Sort method Custom looping:
start_time = time.time()
index += 1
while not index >= len(sorted_list) and sorted_list[index].Z <= new_obj.Z:
index += 1
sorted_list.insert(index - 1, sorted_list.pop(0))
finish_time_2 = time.time() - start_time
results_custom.append(finish_time_2)
print("Native finish time:", mean(results_native))
print("Custom loop finish time:", mean(results_custom))
I tested and compared bisect and sorted containers. Here's the median test result for 1000 tests:
Native finish time: 0.0036494266986846925
Bisect finish time: 3.2123804092407225e-05
Sorted Container finish time: 1.5070676803588867e-05
Sorted Container lists have the most performant sorting procedure and therefor I consider the solution for this post!
The code I used for comparing the performance:
import numpy as np
import bisect
from sortedcontainers import SortedList
import random
from statistics import mean
import time
class Object:
def __init__(self):
self.Z = random.randint(0,1000000)
def __repr__(self):
return str(self.Z)
def __str__(self):
return str(self.Z)
# Creating a sorted list first:
list_base = [Object() for i in range(10000)]
container_sorted_list = SortedList(list_base, key=lambda x: x.Z)
# New object that will be added to the beginning of the list with a higher Z value than the others:
new_obj = Object()
new_obj.Z = 1000001
# Store the timing results in lists for performance testing:
results_native = []
bisect_results = []
sorted_container_results = []
print("START PERFORMANCE TEST")
# Run performance test multiple time:
for i in range(1000):
print(i)
#NATIVE
# inserting new object at the beginning of the list:
# current index:
sorted_list = sorted(list_base, key=lambda name: name.Z)
index = 0
sorted_list.insert(index, new_obj)
# Sort method native:
start_time = time.time()
sorted_list = sorted(sorted_list, key=lambda name: name.Z)
finish_time_1 = time.time() - start_time
results_native.append(finish_time_1)
# BISECT:
sorted_list_bisect = sorted(list_base, key=lambda name: name.Z)
start_time = time.time()
# insert the new value in the correct position
bisect.insort(sorted_list_bisect, new_obj, key=lambda name: name.Z)
finish_time = time.time() - start_time
bisect_results.append(finish_time)
# Making sure bisect gives the same answer as your native method.
if not (np.array(sorted_list) == np.array(sorted_list_bisect)).all():
print("Sorted lists do not match :(")
raise Exception
# SORTED CONTAINER:
start_time = time.time()
container_sorted_list.add(new_obj)
finish_time = time.time() - start_time
sorted_container_results.append(finish_time)
container_sorted_list.remove(new_obj)
print("Native finish time:", mean(results_native))
print("Bisect finish time:", mean(bisect_results))
print("Sorted Container finish time:", mean(sorted_container_results))