In the nbody program on the BenchmarksGame site, I noticed that sumOfSquares()
is written as follows (at the end of the code):
// compute the sum of squares of a 3-tuple's elements
inline proc sumOfSquares((x,y,z)) {
return x**2 + y**2 + z**2;
}
I am wondering if the above notation for dummy argument (x,y,z)
works like "tuple unpacking", so effectively similar to the following?
var (x,y,z) = {actual tuple on the caller side};
To understand it, I tried the following code, and it seems that actual tuple t
is passed to myproc((x,y,z))
by value with each component modifiable in the routine.
(tested with the ATO site)
More specifically, it appears that lines (1) and (2) below are similar in their meaning, and that (2) and (3) work essentially the same (except for unpacking). Is this understanding correct...?
var t_orig = (1.0, "hello");
var t = t_orig;
writeln("t = ", t);
var (x,y) = t; // (1)
x = 100; y = "hi";
writeln("x = ", x, " y = ", y, " t = ", t); // t -> (1.0, "hello")
ref (p,q) = t; // (1')
p = 100; q = "hi";
writeln("p = ", p, " q = ", q, " t = ", t); // t -> (100.0, "hi")
proc myproc((x, y)) // (2)
{
x += 1.0;
y += " world";
writeln("myproc(): ", (x,y));
}
proc myproc2(in u) // (3)
// proc myproc2(ref u) // t -> (2.0, "hello world")
// proc myproc2(inout u) // t -> (2.0, "hello world")
// proc myproc2(u) // (4) error: 'u' is const and cannot be modified
{
u[0] += 1.0;
u[1] += " world";
writeln("myproc2(): ", u);
}
t = t_orig;
myproc(t); // (2.0, "hello world")
writeln("t = ", t); // (1.0, "hello")
t = t_orig;
myproc2(t);
writeln("t = ", t); // depends on routines
Results from ATO:
t = (1.0, hello)
x = 100.0 y = hi t = (1.0, hello)
p = 100.0 q = hi t = (100.0, hi)
myproc(): (2.0, hello world)
t = (1.0, hello)
myproc2(): (2.0, hello world)
t = (1.0, hello)
I am also wondering why the nbody
program uses a function form like (2) rather than (3). Is this mainly for readability of the code, or possibly performance consideration...?
thanks for the questions.
I am wondering if the above notation for dummy argument (x,y,z) works like "tuple unpacking", so effectively similar to the following? […] … Is this understanding correct...?
Yes, I think you're understanding things correctly. Brief documentation for this feature is available in the language specification, but arguably could be improved to clarify the behavior.
The one thing I want to emphasize is your mention of "it seems [to be passed by value]…" because, to support compiler optimizations, I believe Chapel intentionally avoids specifying things like how the implementation will pass the tuple or at what point the locally modifiable copies will be made (or even if they will be… for example, if they're not needed).
Ultimately, I believe it is Chapel's intention to support argument intents (like ref
or inout
) for de-tupled formal arguments like these as well, such that you could write:
proc myproc(ref (x, y)) { … }
// or
proc myproc(inout (x, y)) { … }
and have your procedure modify the original tuple via local assignments to x
and y
. If supported, this would result in behavior more like your (1')/ref t
, or inout t
cases, respectively. But as you may have found if you tried this, it is not supported today:
error: intents on tuple-grouped arguments are not yet supported
I am also wondering why the nbody program uses a function form like (2) rather than (3). Is this mainly for readability of the code, or possibly performance consideration...?
I believe we took this approach primarily for reasons of style rather than performance. Specifically, I tend to avoid indexing into tuples when the result is sufficiently succinct and clear, to avoid getting into 1-based vs. 0-based indexing assumptions or wars. Personally, I also prefer thinking of the components as 'x', 'y', and 'z' in a context like this rather than 't[0]', 't1', 't[2]'.
That said, since the CLBG supports comparing codes by compactness, it'd be interesting to see whether a rewrite like:
inline proc sumOfSquares(in t) {
return t[0]**2 + t[1]**2 + t[2]**2;
}
results in a more compact code using the site's metrics (which involve removing comments and unnecessary whitespace, and then gzipping the result).
In terms of performance, the fact that the procedure is declared as inline
will cause its body to be inlined at the callsite, which makes me expect that the back-end compiler would optimize away any implementation differences between the two approaches. That said, I admittedly haven't compared them recently (if ever). If you were to find that they perform differently, that'd be interesting to know. If it were the case, I'm not aware of any reason that the compiler couldn't be improved to make them perform identically. Certainly our goal is to have minor style differences like this perform equivalently.