I need to save stream of elements in a size limited list. There may be duplicate elements in the stream but I need to just keep the unique ones. Also when the size of my list exceeds a specified limit, I need to remove the oldest element and add the new one.
I already have tried set
and list
. The problem with set
is that it is not size limited and if I want to remove the oldest element I have no idea how to retrieve it because set is unordered; however, it solves the problem of uniqueness.
On the other hand list
keeps the order of items, but I need to check for possible duplicates whenever I want to insert a new element and this can cost a lot of time. Also list
is not size limited as well as set
.
My third option could be collections.deque
but I don't know if it keeps the order or not. And is there any way to keep the items in collections.deque
unique?
These are examples of my codes for list
:
ids = list()
for item in stream:
if item not in ids:
ids.append(item)
if len(ids) >= len_limit:
del ids[0]
and set
:
ids = set()
for item in stream:
ids.add(item)
if len(ids) >= len_limit:
ids.remove(list(ids)[0])
You can write your own class which keeps both deque ans set:
import collections
class Structure:
def __init__(self, size):
self.deque = collections.deque(maxlen=size)
self.set = set()
def append(self, value):
if value not in self.set:
if len(self.deque) == self.deque.maxlen:
discard = self.deque.popleft()
self.set.discard(discard)
self.deque.append(value)
self.set.add(value)
s = Structure(2)
s.append(1)
s.append(2)
s.append(3)
s.append(3)
print(s.deque) # deque([2, 3], maxlen=2)