Search code examples
scalaapache-sparkapache-spark-sql

How to map multidimensional arrays in Scala


I'm a bit new to scala spark. I couldn't find any answer regarding this. I have a array like this

   |Id           |endpoints                         |score|          |type|

|106688      |[[clothes:tops], [clothes]]        |[[0.01], [0.283]]     |[S1S2, S1]   |
|107594      |[[clothes,tops], [clothes]]        |[[0.01], [0.19]]      |[S1S2, S1]   |
|108800      |[[clothes:tops], [clothes]]        |[[0.01], [0.052]]     |[S1S2, S1]   |

I need to map this into each other as in below format.

Map(S1S2 -> Map(clothes:tops -> 0.01)

What is the best approach to mapping records of this array.Basically I need to know how to zip fields as in below format.

 Map(S1S2 -> Map(clothes:tops -> 0.01), S1 -> Map(clothes -> 0.25))

Solution

  • I solved the problem by myself. Hope answer of this will helpful for others too. Basically what I had to use map function. I'll put my code lines here.

       df.map(r =>{
                  val Id = r.getAs[String]("Id")
                  val endpoints = r.getAs[Seq[String]]("endpoints")
                  val score =   r.getAs[Seq[Seq[Double]]]("score")
                  val type = r.getAs[Seq[Seq[String]]]("type")
                  val zipped = endpoint zip scores
                  val merge = zipped.map{
                    r =>
                      val endpoint = r._1
                    val score = r._2
                     endpoint zip(score)
                  }
       val seqmerged = merge.map(r => (r.toMap))
              val endpointmerged = endpointValue zip seqmerged
              (atgId, endpointmerge.toString())