Search code examples
j

The concept of Rank in J


If I have one dimension data such as

v1 =: 5 $ i.5
v1
0 1 2 3 4

then v1 -"1 0 v1 gives me the Euclidean vectors like

v1 -"1 0 v1
 0  1  2  3 4
_1  0  1  2 3
_2 _1  0  1 2
_3 _2 _1  0 1
_4 _3 _2 _1 0

We can also find Euclidean distance matrix easily.

This is how I find the Euclidean vectors and distance matrix of a 2D vector

 v2=: <"1 ? 5 2 $ 10
   v2
┌───┬───┬───┬───┬───┐
│4 0│4 5│5 7│8 3│6 0│
└───┴───┴───┴───┴───┘
direction_vector=: <"1 @: (-"0 @:(-/"2 @: (>"0 @: (diff))))
distance =:     +/"1 @: *: @: (>"2 @:(direction_vector))
m =:        3 : '(i.(#y)) distance"0 _ y'
m v2
 0 25 50 25  4
25  0  5 20 29
50  5  0 25 50
25 20 25  0 13
 4 29 50 13  0

However, my problem is that I am not sure how to find the Euclidean vectors and distance of 2D data in a smart and clean way

As you can see from the table below, my algorithm took more than 1/3 of time to calculate the direction vectors of data. 14.5 second is not bad, but the problem arises when I have a bigger dataset.

 Time (seconds)
+----------------+------+--------+---------+-----+----+------+
|name            |locale|all     |here     |here%|cum%|rep   |
+----------------+------+--------+---------+-----+----+------+
|direction_vector|base  |6.239582| 5.105430| 35.2| 35 |773040|
|move            |base  |9.741510| 1.753868| 12.1| 47 |  3390|
|script          |base  |1.969949| 1.443148|  9.9| 57 |    18|
|distance        |base  |5.650358| 1.318022|  9.1| 66 |579780|
|enclosestrings  |pcsv  |1.491832| 1.255603|  8.6| 75 |     1|
|diff            |base  |1.134585| 1.134585|  7.8| 83 |773186|
|makedsv         |pcsv  |1.728721| 0.236883|  1.6| 84 |     4|
|norm            |base  |0.221794| 0.221794|  1.5| 86 |  3390|
|xpt             |base  |0.194896| 0.194896|  1.3| 87 |  3390|
|ypt             |base  |0.193579| 0.193579|  1.3| 89 |  3390|
|writedsv        |pcsv  |2.067408| 0.186687|  1.3| 90 |     4|
|cosd            |base  |0.172359| 0.172359|  1.2| 91 |   113|
|[rest]          |      |        | 1.300733|  9.0|100 |      |
|[total]         |      |        |14.517587|100.0|100 |      |
+----------------+------+--------+---------+-----+----+------+

I think I can definitely simplify direction_vector one by using rank, but I got stuck. I tried "2 1 "1 1 "1 2 "_ 1 "1 0 ..., but none of them gave me a clear result.

Can anyone help me on this issue? Thank you!


Solution

  • I would begin with noticing that v2 u/ v2 makes a table out of the items of v2 by applying u between those items. Also, you can simplify this a bit by using u/~ v2 which is the same as v2 u/ v2. The next question is what is u, but before we go there, boxing really slows things down and you don't actually have to box the vector for this to work since the items can already be written like this:

       [ v2=: 5 2 $ 4 0 4 5 5 7 8 3 6 0
    4 0
    4 5
    5 7
    8 3
    6 0
    

    and this makes the items the vectors, which is what you want to be able to use u/~ v2

    Now, back to the question of what we want u to be. We are going to be working on the items of v2 Since u is being fed the items of v2 to make the table, you would like to subtract the items from each other as vectors (rank 1) and then square them and then add them together. Translating this into J you get +/@:*:@:-"1 as u

        +/@:*:@:-"1/~ v2
     0 25 50 25  4
    25  0  5 20 29
    50  5  0 25 50
    25 20 25  0 13
     4 29 50 13  0
    

    If you time this I hope that you will find it much faster than your solution because it does not require boxing. The key area that you were concerned about with rank was that it be applied after the Table adverb /

    Hope this helps even though it is a slightly different approach and let me know what your timings look like.