I have a file which contain multiple rows of item codes as follows. There are 1 million rows similar to these
1. 123,134,256,345,789.....
2. 123,256,345,678,789......
.
.
I would like to find the count of all the pair of words/items per row in the file using q in kdb+. i.e. any two pair of words that occur in the same row can be considered a word pair. e.g:
(123,134),(123,256),(134,256), (123,345) (123,789), (134,789) are some of the word pairs in row 1 (123,256),(123,345),(123,345),(678,789),(345,789) are some of the word pairs in row 2
word/item pair count
`123,134----1
123,256---2
345,789---2`
I am reading the file using read0 and have been able to convert each line into list using vs
and using count each group
to count the number of words, but now I want to find the count of all the word pairs per row in the file.
Thanks in advance for your help
I'm not 100% I understand your definition of a word-pair. Perhaps you could expand a little if my logic doesn't match what you were looking for.
In the example below, I've created a 5x5 matrice of symbols for testing - selected distinct pairs of values from each row, and then checked how many rows each of these appeared in, in total.
Please double check with your own results.
q)test:5 cut`$string 25?5
q)test
2 0 1 0 0
2 4 4 2 0
1 0 0 3 4
2 1 1 4 4
3 0 3 4 0
q)count each group raze {l[where(count'[l:distinct distinct each asc'[x cross x:distinct x]])>1]} each test
0 2| 2
1 2| 2
0 1| 2
2 4| 2
0 4| 3
1 3| 1
1 4| 2
0 3| 2
3 4| 2