Search code examples
dictionaryclojurezipmap

How to pair items from a nested vector with corresponding single values in Clojure?


I'm learning Clojure, and want to understand more about sequences. I have a real-life problem which I have reduced to a general one, but I don't know if it has a canonical name. Hopefully the example below makes it clear.

Say I have two vectors, src and dst. The items in the src vector are themselves vectors, and I need to map each item in each vector into the corresponding value in dst.

(def src [ ["a1" "a2" "a3"] ["b1" "b2"] ["c1" "c2" "c3" "c4"] ])
(def dst [ "a" "b" "c" ])

I want to produce the following map:

{ :a1 "a", :a2 "a", :a3 "a", :b1 "b", :b2 "b", :c1 "c", :c2 "c", :c3 "c", :c4 "c" }

I can do this just fine in Python, but the Clojure way of doing it is not clear to me. For this problem, I could just construct a map, but I want to be able to do it in a generic way, not just for this instance.

In Python, this would be:

src = [['a1', 'a2', 'a3'], ['b1', 'b2'], ['c1', 'c2', 'c3', 'c4']]
dst = ['a', 'b', 'c']
result = {}
for (s, d) in zip(src, dst):
    for x in s:
        result[x] = d

In Clojure, I've tried starting with:

(interleave src dst)
;=> (["a1" "a2"] "a" ["b1" "b2" "b3"] "b" ["c1"] "c")

So I've flattened the vectors, but I don't know how to iterate over the map keys, and pick the values.

Also, zipmap doesn't get me too far by itself:

(zipmap src (map keyword dst))
;=> {["c1"] :c, ["b1" "b2" "b3"] :b, ["a1" "a2"] :a}
;bogus result

Now I would need to flip the map keys and values, and still iterate.

I haven't been successful in constructing a for expression either:

(for [s src] (zipmap s dst)))
;=> ({"a2" "b", "a1" "a"} {"b3" "c", "b2" "b", "b1" "a"} {"c1" "a"})
;bogus result

I'm approaching the problem as pairing two vectors, but I can't seem to get the vectors from the src vector into position so that I could just simply zipmap each of them with dst.

I suspect the answer is really obvious, but my brain is still not working functionally enough. Maybe there is an into {} and/or assoc in there somewhere.

Any pointers? If you're interested, the real-life problem I mentioned is mapping from RNA codons to amino acids.


Solution

  • map can take multiple seqs to iterate over, e.g.:

    (map + [1 2 3] [4 5 6])
    ;; => (5 7 9)
    

    So, this would be the way to get the values you want to process into the same function, resulting in processing of the pairs ["a1" "a2" "a3"]/"a", etc...

    (map
      (fn [src dst]
        ???)
      [["a1" "a2" "a3"] ["b1" "b2"] ["c1" "c2" "c3" "c4"]]
      ["a" "b" "c"])
    

    zipmap takes a seq of keys (which we have) and a seq of values (which we have to construct from a single value). repeat can be used to create an infinite lazy seq based on a constant value:

    (take 3 (repeat "a"))
    ;; => ("a" "a" "a")
    

    And:

    (zipmap ["a1" "a2" "a3"] (repeat "a"))
    ;; => {"a3" "a", "a2" "a", "a1" "a"}
    

    This makes the original code look like this:

    (map
      (fn [src dst]
        (zipmap src (repeat dst)))
      [["a1" "a2" "a3"] ["b1" "b2"] ["c1" "c2" "c3" "c4"]]
      ["a" "b" "c"])
    ;; => ({"a3" "a", "a2" "a", "a1" "a"} {"b2" "b", "b1" "b"} {"c4" "c", "c3" "c", "c2" "c", "c1" "c"})
    

    And finally, you can merge all of these maps into a single one using into, resulting in this final piece of code:

    (into {} (map #(zipmap %1 (repeat %2)) src dst))
    ;; => {"a3" "a", "c2" "c", "c3" "c", "a1" "a", "b2" "b", "c4" "c", "a2" "a", "c1" "c", "b1" "b"}