which one is a better design in the following scenario and why?
A:
stop_words = ['com1', 'com2']
def clean_text(text_tokens, stop_words):
return [token for token in text_tokens if token not in stop_words]
clean_text(['hello', 'world', 'com1', 'com2'], stop_words)
B:
def clean_text(text_tokens):
stop_words = ['com1', 'com2']
return [token for token in text_tokens if token not in stop_words]
clean_text(['hello', 'world', 'com1', 'com2'])
C:
STOP_WORDS = ['com1', 'com2']
def clean_text(text_tokens):
return [token for token in text_tokens if token not in STOP_WORDS]
clean_text(['hello', 'world', 'com1', 'com2'])
Added C version based on @MisterMiyagi answer.
Note1: In this context, stop_words is fixed and does not change.
Note2: stop_words can be a small or a very large list.
Middle ground: use a default value for the argument.
def clean_text(text_tokens, stop_words={'com1', 'com2'}):
return [token for token in text_tokens if token not in stop_words]
clean_text(['hello', 'world', 'com1', 'com2'])
Now the constant {'com1', 'com2'}
is only created once (when the function is defined); it doesn't pollute the global scope; and if you end up wanting to, you can optionally pass different stop_words
when you call clean_text
.