As a student, I have a course about using new features in Python3.8+ and it also includes Python3.10. So I'm now trying to use match-case in my homework to kill a long series of if-else which should be replace by switch-case in cpp.
In my homework I need a function to convert string "male/female/..." into integer. It's quite easy but I was suddenly interested in the performace of different string-compare methods so I designed the testing code below to test the performance among three different methods.
import timeit
from typing import Callable
def test_match_case(sex: str):
match sex:
case 'male':
gender = 1
case 'female':
gender = 2
case '________female':
gender = 3
case '#________________female':
gender = 4
case _:
gender = 0
return gender
def test_if_full_compare(sex: str):
if sex == 'male':
gender = 1
elif sex == 'female':
gender = 2
elif sex == '________female':
gender = 3
elif sex == '#________________female':
gender = 4
else:
gender = 0
return gender
def test_if_startswith(sex: str):
if sex.startswith('m'):
gender = 1
elif sex.startswith('f'):
gender = 2
elif sex.startswith('_'):
gender = 3
elif sex.startswith('#'):
gender = 4
else:
gender = 0
return gender
def _test_and_print(func: Callable, arg: str, times: int = 10000000):
func_name = func.__name__
time_cost = timeit.timeit(f"""{func_name}("{arg}")""", f"""from __main__ import {func_name}""", number=times)
print(f"func: {func_name} arg: {arg} time_cost: {time_cost:.6f}")
_test_and_print(test_match_case, "male")
_test_and_print(test_match_case, "female")
_test_and_print(test_match_case, "________female")
_test_and_print(test_match_case, "#________________female")
_test_and_print(test_if_full_compare, "male")
_test_and_print(test_if_full_compare, "female")
_test_and_print(test_if_full_compare, "________female")
_test_and_print(test_if_full_compare, "#________________female")
_test_and_print(test_if_startswith, "male")
_test_and_print(test_if_startswith, "female")
_test_and_print(test_if_startswith, "________female")
_test_and_print(test_if_startswith, "#________________female")
Here is the testing result on my PC:
func: test_match_case arg: male time_cost: 0.517885
func: test_match_case arg: female time_cost: 0.610869
func: test_match_case arg: ________female time_cost: 0.726382
func: test_match_case arg: #________________female time_cost: 0.846604
func: test_if_full_compare arg: male time_cost: 0.487761
func: test_if_full_compare arg: female time_cost: 0.578670
func: test_if_full_compare arg: ________female time_cost: 0.666413
func: test_if_full_compare arg: #________________female time_cost: 0.827917
func: test_if_startswith arg: male time_cost: 0.917373
func: test_if_startswith arg: female time_cost: 1.362152
func: test_if_startswith arg: ________female time_cost: 1.817514
func: test_if_startswith arg: #________________female time_cost: 2.249916
I have learned a little about "Resident Memory" in Python, but I still have tons of questions about the result above.
I've already searched with words "[python] string compare" but didn't get any expected result.
There's three reasons for your performance issues, the first two essentially the language/interpreter's "fault", the other one a consequence of how you constructed your test cases:
match
can't typically optimize much (if I recall correctly, it might even be required to consider case
s in order, making it even harder to optimize), and in practice, never performs meaningful optimizations for "jumping" to the right case
. The match
construct's design actually adds a tiny amount of overhead over if
/elif
/else
chains here, but it's otherwise equivalent to the if
/elif
/else
chain, making it slightly slower in most cases (see details below; there's a reason it catches up on the final case).==
checks), so if the work done is the same, the bytecode operation wins. They've been improving this over time with LOAD_METHOD
/CALL_METHOD
opcodes that avoid creating bound method objects, and the Vectorcall protocol to reduce argument handling overhead, but a dedicated bytecode still wins in most cases, so if ==
and startswith
do the same real work, ==
will win.startswith
doesn't save anything in this particular use case; every one of your fixed strings is of a different length, so the actual equality test starts off with "do the lengths match?", finds they don't, and immediately says "not equal"; it never even reads a single byte of the string data when they're not equal. And when they are equal, odds are since three of the four strings involved are literals containing legal variable names (which CPython interns as an implementation detail), isn't not even comparing the string's data in the case where they're equal; it checks the "is interned" bit, finds its set in both, and knowing that, can compare the pointer addresses (it might even do this before the size check); two interned strings are equal if and only if they are the same string, so again, no data comparisons required.Details on point #1/question #2: I checked the results of calling dis.dis
on two identical functions, one of which used match
/case
and one of which used if
/elif
/else
chains. The bytecode produced by CPython 3.10.0 is almost identical, except:
match
/case
chain loads the string only once, then before each test save the last one, it uses the DUP_TOP
instruction to double the value on the top of the stack (just double the reference, no copies made) so the new copy can be used for the comparison, and the original one sticks around for the next comparison. The DUP_TOP
is a super-cheap instruction (roughly equivalent to the LOAD_FAST
it uses in the if
chain where you have to name the local variable over and over), but...POP_TOP
instruction to clear the duplicate that the if
/elif
chain doesn't need.This explains your observation in question #2; all the code paths that end with a case
that involves a DUP_TOP
execute one extra bytecode instruction (the corresponding POP_TOP
), while the final case doesn't need it so the instructions executed end up being roughly the same (it executes one LOAD_FAST
and case count DUP_TOP
s, rather than case count LOAD_FAST
s); DUP_TOP
being very slightly cheaper than LOAD_FAST
(but not nearly enough to make up the cost of a POP_TOP
when it's needed), you perform slightly better on the final case
of the match
that uses the value being matched.