Search code examples
rcombinationspermutationlarge-datageosphere

Calculate large number of permutations in R


I have 2 large dataframes in R, both with circa 100k rows, which hold lists of geo coordinates (lat/ long). I am looking to iterate across them getting all combinations between items and thereafter, applying a function to it.

Because the number of combinations is around 11 billion (11 x 1.000.000.000), my original idea of using a loop is not applicable.

The dataframes would resemble something like:

A<-as.data.frame(cbind(rbind(-0.1822,-0.4419,0.2262),rbind(51.5307,51.4856,51.4535)))

(...)
<!-- -->

V1 . V2

-0.1822 . 51.5307 

-0.4419 . 51.4856

 0.2262 . 51.4535

B<- as.data.frame(cbind(rbind(-0.4764,-0.2142,-0.2197),rbind(51.5221,51.4593,51.5841))) 
(...)
<!-- -->

V1 . V2

-0.4764 . 51.5221

-0.2142 . 51.4593

-0.2197 . 51.5841

I would like the output to look like:

V1a .   V2a .   V1b .   V2b


-0.1822 . 51.5307 . -0.4764 . 51.5221  

-0.4419 . 51.4856 . -0.4764 . 51.5221

 0.2262 . 51.4535 . -0.4764 . 51.5221

-0.1822 . 51.5307 . -0.2142 . 51.4593

-0.4419 . 51.4856 . -0.2142 . 51.4593

(...)

Another post here in stackoverflow ([a link]Calculating great-circle distance matrix ) suggests using:

apply(A, 1, FUN=function(X) distHaversine(X, B))

However, I suspect that the matrix created is too large for it to complete the calculations.

Any ideas on how to solve this efficiently? Keeping in mind that my objective is thereafter to apply the Haversine function to calculate distances between the points.

Thanks J


Solution

  • cmb<-expand.grid(1:nrow(A),1:nrow(B))
    cbind(A[cmb[,1],],B[cmb[,2],])
    

    Unlike Andre's solution, this won't create combinations of the columns within each of A and B (his creates 81 rows, whereas for this sample, only 9 are desired). Not sure if this will work for your larger dataset, though.