This answer and its comments provide some insight into the inner working's of CPython's str.join()
:
list
or a tuple
, a new list
is created with the same contents.This seems questionable to me. For starters, why reject all sequence types but two? Wouldn't just iterating over any sequence twice instead of copying it be much faster? And why make a list
, particularly if you can't know the length of the iterable you're making it from? You don't need random access, just repeated iteration, and using a list
means you might have to reallocate and copy several times during its generation. Wouldn't it make more sense to use a linked list or a deque
?
Can anyone provide some insights into these design decisions?
For starters, why reject all sequence types but two? Wouldn't just iterating over any sequence twice instead of copying it be much faster?
The argument of join
need not be a sequence. It be any iterable, and some iterables cannot be iterated over more than once. It could, for instance, be a generator expression, which would be exhausted after iterating over it once.
As to your second question, I don't know specifically, although I'd guess that using lists and tuples internally simplifies the implementation at the C level. I think the broader answer to your question is that no one was really intending to make every possible optimization to str.join
. I would guess that the vast majority of use cases are calling it on a list or tuple anyway.