I'm working on a little set of scripts in python, and I came to this:
line = "a b c d e f g"
a, b, c, d, e, f, g = line.split()
I'm quite aware of the fact that these are decisions taken during implementation, but shouldn't (or does) python offer something like:
_, _, var_needed, _, _, another_var_needed, _ = line.split()
as well as Prolog does offer, in order to exclude the famous singleton variables
.
I'm not sure, but wouldn't it avoid unnecessary allocation? Or creating references to the result of the split
call does not count up as overhead?
EDIT:
Sorry, my point here is: in Prolog, as far as I'm concerned, in an expression like:
test(L, N) :-
test(L, 0, N).
test([], N, N).
test([_|T], M, N) :-
V is M + 1,
test(T, V, N).
The variable represented by _
is not accessible, for what I suppose the reference to the value that does exist in the list [_|T]
is not even created.
But, in Python, if I use _
, I can use the last value assigned to _
, and also, I do suppose the assignment occurs for each of the variables _
-- which may be considered an overhead.
My question here is if shouldn't there be (or if there is) a syntax to avoid such unnecessary attributions.
_
is a perfectly valid variable name and yes, you can use a variable multiple times in an unpacking operation, so what you've written will work. _
will end up with the last value assigned in the line. Some Python programmers do use it this way.
_
is used for special purposes by some Python interactive shells, which may confuse some readers, and so some programmers do not use it for this reason.
There's no way to avoid the allocation with str.split()
: it always splits the whole line, and the resulting strings are always allocated. It's just that, in this case, some of them don't live very long. But then again, who does?
You can avoid some allocations with, say, re.finditer()
:
import re
fi = re.finditer(r"\S+", line)
next(fi)
next(fi)
var_needed = next(fi).group()
next(fi)
next(fi)
another_var_needed = next(fi).group()
# we don't care about the last match so we don't ask for it
But next()
returns a Match
object and so it'll be allocated (and immediately discarded since we're not saving it anywhere). So you really only save the final allocation. If your strings are long, the fact that you're getting a Match
object and not a string could save some memory and even time, I guess; I think the matched string is not sliced out of the source string until you ask for it. You could profile it to be sure.
You could even generalize the above into a function that returns only the desired tokens from a string:
import re
def get_tokens(text, *toknums):
toknums = set(toknums)
maxtok = max(toknums)
for i, m in enumerate(re.finditer(r"\S", text)):
if i in toknums:
yield m.group()
elif i > maxtok:
break
var1, var2 = get_tokens("a b c d e f g", 2, 5)
But it still ain't exactly pretty.