Search code examples
matrixvectorapl

APL - idiomatic way to filter, group, and sort from matrices


I am attempting to learn APL, but I am having a hard time writing code in an idiomatic way. Part of the problem, I believe, is that it is hard to find teaching material sufficiently broad and complete. For instance, consider the following problem: we have a matrix with n rows and 3 columns. Assume that all columns are of the same type. We want to obtain a vector of vectors of the elements on the second column, grouped by the value of the first column, and sorted by the value in the third column. So, for instance, for the input matrix

1 A  1
1 B  3
2 F  5
2 B  2
1 C  8
1 D 10
2 D  9
2 E 13

we would like the output to be

┌────┬────┐
│ABCD│BFDE│
└────┴────┘

I came up with:

{1 ⌷[2]¨{(⍵[;1] {⍺ ⍵}⌸ ⍵[;2 3])[;2]} {↑(↓⍵)[⍋⍵[;3]]} ⍵}

which works, but I am certain that there is a much easier, elegant, and idiomatic way to do this in APL. How would you do it?


Solution

  • input← (⍪1 1 2 2 1 1 2 2),(⍪'ABFBCDDE'),⍪1 3 5 2 8 10 9 13
    {↓x[;1]⊢⌸2⌷[2]x←⍵[⍋⍵[;3];]}input
    {x[;1]⊂⍤⊢⌸2⌷[2]x←⍵[⍋⍵[;3];]}input ⍝ fixed to reflect requested output format
    

    Assume ⎕IO←1.

    We want to obtain a vector of vectors of the elements on the second column, grouped by the value of the first column, and sorted by the value in the third column.

    It is easier to achieve that by sort the input before regrouping using since it is good to know the order is preserved during regrouping. While this behvavior is intuitive enough to be inferred from experimenting with this operator, the language reference clearly says.

    The elements of R appear in the order in which they first appear in Y

    ⊢⌸ is equivalent and shorter form of {⍵}⌸, of course. (And ⊂⍤⊢⌸ equivalent of {⊂⍵}⌸)

    Now, I have formed my solution independently, let's find what is the difference.

    It seems you have observed the behavior of , then just be aware
    that for the input data type of this problem,

    {↑(↓⍵)[⍋⍵[;3]]} ←→ {⍵[⍋⍵[;3];]}

    And it is rare for idiomatic APL code to nest dfns (aka anonymous function). So just open it:

    {1 ⌷[2]¨{(⍵[;1] {⍺ ⍵}⌸ ⍵[;2 3])[;2]} ⍵[⍋⍵[;3];]}
    

    If you don’t remove abstraction in APL code, you will lose the opportunity to seek further optimizations. The next function application can also be opened using local variable assignment,

    {1 ⌷[2]¨(x[;1] {⍺ ⍵}⌸ x[;2 3])[;2]⊣x←⍵[⍋⍵[;3];]}
    

    Now, since only the grouped data instead of index key is needed for the result, remove the redundancy altogether by keeping the data lean and efficient at first place:

    {(x[;1]{⊂⍵}⌸ x[;2])⊣x←⍵[⍋⍵[;3];]}
    

    And squeeze the result a little bit it would be identical to the solution I developed or went for a different method for simplification.

    {⊃⊂⍤⊢⌸/↓⍉⍵[⍋⍵[;3];][;1 2]}