I have an OD matrix (origin-destination matrix) written in list form, like this inputfile.csv
Which reads as:
All the origin-destination pairs that have 0 trips, are not present in the input file (the elements of the matrix with zeros).
I need to compute the symmetric matrix as S=(OD+DO)/2
, but the main problem is that the inputfile.csv
is 30GB in size. I thought that a tool like awk could be a good solution, but I don't know how to proceed. I think that the pseudo algorithm should be something like this:
and trips12
(where origin1,destination2
can be any origin_id
or desitnation_id
is present in the file, keep trips21
and write:
origin1,destination2 --> (trips12+trips21)/2
origin1,destination1: (trips12)/2
I think that awk can be great for this task, but I am open to use any suggested tool (python, perl, octave, etc...)
awk -F"\"" '{a[$2$4]==$6;if $4$2 ...}' inputfile.csv
No clue how to do it...
Desired output:
How much RAM do you have? Would this approach work?
awk 'BEGIN {
FS = OFS = ","
NR == 1 {
gsub("\"", "", $3)
a[$1 FS $2] = $3
b[$2 FS $1] = $3
for (i in a) {
if (i in b) {
print i, "\"" (a[i] + b[i]) / 2 "\""
}' inputfile.csv