Search code examples
scalaapache-sparkdomain-driven-design

Join function in Scala


Is it possible to join lists in Scala similar to what can be done using Spark or Pandas dataframes. For example,

val findMatch(hosts:List[Person], guests: List[Person]):List[(Person, Person)] = ??? \\ project, filter and join 

The intent is to specify the logic for merging collections in Scala on the lines of SQL using SELECT, JOIN, WHERE and other verbs.

If my understanding is correct, one could use Spark, but it would be too slow for an online application. But more importantly, the logic becomes a domain-level specification, by doing the join on the lists.


Solution

  • Short answer is you don't have something like that. You can generate a cartesian product and filter what you don't need base on some condition

    def join[A,B](left:List[A], right: List[B])(f: (A,B) => Boolean):List[(A,B)] =
      for {
        l <- left
        r <- right if(f(l,r))
      } yield (l,r)
    

    it could work mean while the collections are small. the complexity of that operation is O(n * m) where m is the size of the first collection and m the size of the second one.

    You could try to use foldLeft combined with some accumulator using a dictionary, where maybe you can reduce the temporal complexity increasing the space complexity. Maybe trying something with trees, where you would have to think in the trade-offs based on your needs.

    I don't think that you can do an sql join easily using just simple collections. That's why databases and tools like spark or pandas exist