Find the count of word pairs in kdb+

I have a file which contain multiple rows of item codes as follows. There are 1 million rows similar to these

  1.  123,134,256,345,789.....
  2.  123,256,345,678,789......
   .
   .

I would like to find the count of all the pair of words/items per row in the file using q in kdb+. i.e. any two pair of words that occur in the same row can be considered a word pair. e.g:

(123,134),(123,256),(134,256), (123,345) (123,789), (134,789) are some of the word pairs in row 1 (123,256),(123,345),(123,345),(678,789),(345,789) are some of the word pairs in row 2

word/item pair count  

 `123,134----1 
  123,256---2
  345,789---2`

I am reading the file using read0 and have been able to convert each line into list using vs and using count each group to count the number of words, but now I want to find the count of all the word pairs per row in the file.

Thanks in advance for your help

Solution

I'm not 100% I understand your definition of a word-pair. Perhaps you could expand a little if my logic doesn't match what you were looking for.

In the example below, I've created a 5x5 matrice of symbols for testing - selected distinct pairs of values from each row, and then checked how many rows each of these appeared in, in total.

Please double check with your own results.

q)test:5 cut`$string 25?5

q)test
2 0 1 0 0
2 4 4 2 0
1 0 0 3 4
2 1 1 4 4
3 0 3 4 0

q)count each group raze {l[where(count'[l:distinct distinct each asc'[x cross x:distinct x]])>1]} each test
0 2| 2
1 2| 2
0 1| 2
2 4| 2
0 4| 3
1 3| 1
1 4| 2
0 3| 2
3 4| 2