I want to check whether a list is a valid sequence of chunks, where each chunk begins with some value and ends with the next occurrence of the same value. For example, this is a valid sequence of three chunks:
lst = [2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4, 5, 2]
\___________/ \_____/ \_______________________/
And this is one is invalid:
lst = [2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4]
\___________/ \_____/ \_____ ... missing the 2 to end the chunk
I have a solution but it's bad. Do you see something better?
def is_valid(lst):
while lst:
start = lst.pop(0)
if start not in lst:
return False
while lst[0] != start:
lst.pop(0)
lst.remove(start)
return True
# Tests, should print: True, False, True, False, True
print(is_valid([2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4, 5, 2]))
print(is_valid([2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4]))
print(is_valid(['I', 'N', 'O', 'A', 'I', 'L', 'L', 'T', 'R', 'X', 'I', 'I', 'N', 'X', 'F', 'T']))
print(is_valid(['T', 'I', 'N', 'I', 'X', 'R', 'O', 'F', 'T', 'I', 'N', 'I', 'X', 'L', 'L', 'A']))
print(is_valid([]))
How about this, creating an iter
from the list and searching forward on that iter until the next
matching element is found. Note that this might fail is None
can be an element of the list; then you should rather define and compare against a sentinel obj = object()
.
def is_valid(lst):
it = iter(lst)
for x in it:
if next((y for y in it if y == x), None) is None:
return False
return True
Since we don't actually need the value returned by next
, we can also just use any
instead, at the same time solving the problem of the default
element. Like next
, any
will consume the iterator just as far as the matching element, if any:
def is_valid(lst):
it = iter(lst)
for x in it:
if not any(y == x for y in it):
return False
return True
This can be further shortened using all
instead of the outer for
loop:
def is_valid(lst):
it = iter(lst)
return all(any(y == x for y in it) for x in it)
And this can finally be reduced to the equally cryptic and intriguing:
def is_valid(lst):
it = iter(lst)
return all(x in it for x in it)
Each way, each element is visited exactly once, the original list is not changed, little to no extra space, and IMHO it's even somewhat easy to read and understand.
This never was about speed, but anyway: Here are some benchmarks of the different solutions (and some more variations), running the test cases from the question as well as two random lists of 1,000 integers, one valid and one invalid, 10,000 times, on Python 3.8.10:
# with long lists # only short test lists
1.52 is_valid_index 0.22 is_valid_index
3.28 is_valid_next 0.30 is_valid_next
2.78 is_valid_for_for_else 0.13 is_valid_for_for_else
5.26 is_valid_for_any 0.32 is_valid_for_any
5.29 is_valid_all_any 0.38 is_valid_all_any
3.42 is_valid_all_any_if 0.36 is_valid_all_any_if
2.02 is_valid_all_in 0.18 is_valid_all_in
1.97 is_valid_all_in_if 0.17 is_valid_all_in_if
1.87 is_valid_for_in 0.11 is_valid_for_in
Of course, all are O(n). With the long 1000-element-lists, the solution using index
is fastest, but the one with x in it
is not too bad, either. The any
solutions lag somewhat behind, but are about as fast (or slow) as next
when using a generator with condition, but still slower than when using plain for
loops.
With only the short test-lists, it's a bit different: Here, the solutions using one iterator and for-for-else
and for-in
are fastest by quite some margin.