I have a hope to perform a full self-cross join on a large data file of points. However, I cannot use programming language to perform the operation because I cannot store it in memory. I would like to find all combinations of points within the set. Below would be an example of my dataset.
x y
1 9
2 8
3 7
4 6
5 5
I would like to cross join on this data to generate 25-row table containing all the combination of points. Would there be a low memory solution? perhaps with awk
?
Thank you,
Nicholas Hayden
P.S. I am a novice programmer.
perhaps in two steps, create a header, column1 and column2 files and join the column1 and column2 and append to header file
awk 'NR==1{print > "cross"} NR>1 {print $1 > "col1"; print $2 > "col2"}' file
join -j9 col1 col2 -o1.1,2.1 >> cross
rm col1, col2
obviously make sure the temp and final file names won't clash with the existing ones.
Note, the join
command on MacOS doesn't have the -j
option, so change it to equivalent long form
join -19 -29 col1 col2 -o1.1,2.1 >> cross
in both alternatives we're asking join to use the non-existent 9th field as the key which matches every line of the first file to every line in the second to generate the cross product of the two files.