I am attempting to understand the classic quicksort in APL:
Q←{1≥≢⍵:⍵ ⋄ S←{⍺⌿⍨⍺ ⍺⍺ ⍵} ⋄ ⍵((∇<S)⍪=S⍪(∇>S))⍵⌷⍨?≢⍵}
There are some things I don't understand, and some stylistic choices that bother me, so I'm going to list all of them out. I hope someone can explain them to me.
{ }
defn, ⍺
is the left argument, and ⍵
is the right argument. What is ⍺⍺
in S←{⍺⌿⍨⍺ ⍺⍺ ⍵}
? Similarly, is there an ⍵⍵
? Does the ⍺
inside the S
refer to the left argument of S
or the left argument of Q
? My guess is that ⍺
inside the S
refers to the left argument of S
. The ⍺⍺
refers to the ⍺
of the enclosing function (ie, the ⍺
of Q).
⍨
)? Is the code not far clearer written as:Q←{1≥≢⍵:⍵ ⋄ S←{(⍺ ⍺⍺ ⍵)⌿⍺} ⋄ ⍵((∇<S)⍪=S⍪(∇>S))⍵[?≢⍵]}
The only use I can think of using commute is to reduce the use of brackets ()
and []
, but that hardly seems worth the loss in legibility. Am I missing something of the "APL way" here?
This is not actually performing quicksort, is it? Quicksort is defined as being in-place. However, my understanding of the APL semantics is that this piece of code in fact builds new arrays on the recursive sub-calls, and the concatenates them using ⍪
. Indeed, this is the same criticism that is levelled at Haskell's quicksort. Is there something I am missing in the APL semantics that informs that this operation is done "in-place"? Do note that I am not interested in "sufficiently smart compiler" arguments, since array analysis is fundamentally challenging. If the APL compiler does actually turn this into an in-place algorithm, I would greatly value details on how it manages to perform this analysis --- that's quite an accomplishment!
Why the use of ≢⍵
to find the dimension size? Why not ⍴⍵
? In general, I find that people use ≢
over ⍴
to query for the size along the outermost dimension, even when the only case where the function work is in 1D. Again, I presume there is something in the APL way that I am missing.
Thanks a lot.
- I understand that inside a
{ }
defn,⍺
is the left argument, and⍵
is the right argument. What is⍺⍺
inS←{⍺⌿⍨⍺ ⍺⍺ ⍵}
? Similarly, is there an⍵⍵
?
S←{⍺⌿⍨⍺ ⍺⍺ ⍵}
is called a dop. Similar to a dfn being a user-defined function, dop is a user-defined operator which behaves like ¨
, ⍨
, or ∘
.
Summary of its semantics:
⍺⍺
(not ⍵⍵
), it becomes a monadic operator, which you can use as x (m S) y
.⍵⍵
, it becomes a dyadic operator, which you can use as x (m S n) y
.⍺⍺
(which will have the value of m
) and ⍵⍵
(which will have that of n
) can be an array or a function.⍺
in the dop's body, in which case you can call it as either (m S) y
or (m S n) y
, omitting the left argument.m
is called the left operand, and n
is the right operand. These are different things from a left argument (x
) and right argument (y
).In your example, S
mentions only ⍺⍺
, so it is called as x (m S) y
. If you call S
like 1 2 3 >S 2
, it will evaluate to 1 2 3⌿⍨1 2 3 > 2
, which will be a single 3.
Does the
⍺
inside theS
refer to the left argument ofS
or the left argument ofQ
?
Inside the body of S
, everything made of ⍺
and ⍵
characters refers to the argument/operand of S
. The original arguments of Q
are invisible (unless they are assigned to a variable first, in which case they are visible as the variable name).
- Why the copious use of commute (
⍨
)?
I believe it is mostly a stylistic choice. I also prefer to write parenthesized code instead of using it in production code, except when the usage is easily recognizable as an APL idiom. I do write e.g. 3÷⍨whatever
instead of (whatever)÷3
for division by constant.
- This is not actually performing quicksort, is it?
You're correct. As you mentioned already, Quicksort is designed to run in-place to truly become Quick (TM). APL can do memory pre-allocation and array sharing to reduce some of the memory copy and allocation, but at least some copies are unavoidable when the three sub-arrays (having elements less than / equal to / greater than the pivot) are created and later concatenated.
One thing to note is that, unlike Haskell, APL does have in-place assignment which looks like x[i]←v
. If one were to implement Quicksort properly in APL, one would have to code like one would code in C (passing indices to recursive calls and such).
- Why the use of
≢⍵
to find the dimension size? Why not⍴⍵
?
≢
is called "tally" while ⍴
is called "shape". ≢
always returns a scalar value, while ⍴
returns a vector (it would be a length one vector if given a vector argument). While a scalar and a length-one vector behave the same in most scenarios, they are different things, e.g. 1≡,1
is false.
I believe it's a good habit to differentiate the two, favoring a scalar over a length-one vector whenever possible. One notable exception is when you need an explicitly enclosed array (a scalar can't be enclosed).