I'm learning Clojure, and want to understand more about sequences. I have a real-life problem which I have reduced to a general one, but I don't know if it has a canonical name. Hopefully the example below makes it clear.
Say I have two vectors, src
and dst
. The items in the src
vector are themselves vectors, and I need to map each item in each vector into the corresponding value in dst
.
(def src [ ["a1" "a2" "a3"] ["b1" "b2"] ["c1" "c2" "c3" "c4"] ])
(def dst [ "a" "b" "c" ])
I want to produce the following map:
{ :a1 "a", :a2 "a", :a3 "a", :b1 "b", :b2 "b", :c1 "c", :c2 "c", :c3 "c", :c4 "c" }
I can do this just fine in Python, but the Clojure way of doing it is not clear to me. For this problem, I could just construct a map, but I want to be able to do it in a generic way, not just for this instance.
In Python, this would be:
src = [['a1', 'a2', 'a3'], ['b1', 'b2'], ['c1', 'c2', 'c3', 'c4']]
dst = ['a', 'b', 'c']
result = {}
for (s, d) in zip(src, dst):
for x in s:
result[x] = d
In Clojure, I've tried starting with:
(interleave src dst)
;=> (["a1" "a2"] "a" ["b1" "b2" "b3"] "b" ["c1"] "c")
So I've flattened the vectors, but I don't know how to iterate over the map keys, and pick the values.
Also, zipmap
doesn't get me too far by itself:
(zipmap src (map keyword dst))
;=> {["c1"] :c, ["b1" "b2" "b3"] :b, ["a1" "a2"] :a}
;bogus result
Now I would need to flip the map keys and values, and still iterate.
I haven't been successful in constructing a for
expression either:
(for [s src] (zipmap s dst)))
;=> ({"a2" "b", "a1" "a"} {"b3" "c", "b2" "b", "b1" "a"} {"c1" "a"})
;bogus result
I'm approaching the problem as pairing two vectors, but I can't seem to get the vectors from the src
vector into position so that I could just simply zipmap
each of them with dst
.
I suspect the answer is really obvious, but my brain is still not working functionally enough. Maybe there is an into {}
and/or assoc
in there somewhere.
Any pointers? If you're interested, the real-life problem I mentioned is mapping from RNA codons to amino acids.
map
can take multiple seqs to iterate over, e.g.:
(map + [1 2 3] [4 5 6])
;; => (5 7 9)
So, this would be the way to get the values you want to process into the same function, resulting in processing of the pairs ["a1" "a2" "a3"]
/"a"
, etc...
(map
(fn [src dst]
???)
[["a1" "a2" "a3"] ["b1" "b2"] ["c1" "c2" "c3" "c4"]]
["a" "b" "c"])
zipmap
takes a seq of keys (which we have) and a seq of values (which we have to construct from a single value). repeat
can be used to create an infinite lazy seq based on a constant value:
(take 3 (repeat "a"))
;; => ("a" "a" "a")
And:
(zipmap ["a1" "a2" "a3"] (repeat "a"))
;; => {"a3" "a", "a2" "a", "a1" "a"}
This makes the original code look like this:
(map
(fn [src dst]
(zipmap src (repeat dst)))
[["a1" "a2" "a3"] ["b1" "b2"] ["c1" "c2" "c3" "c4"]]
["a" "b" "c"])
;; => ({"a3" "a", "a2" "a", "a1" "a"} {"b2" "b", "b1" "b"} {"c4" "c", "c3" "c", "c2" "c", "c1" "c"})
And finally, you can merge all of these maps into a single one using into
, resulting in this final piece of code:
(into {} (map #(zipmap %1 (repeat %2)) src dst))
;; => {"a3" "a", "c2" "c", "c3" "c", "a1" "a", "b2" "b", "c4" "c", "a2" "a", "c1" "c", "b1" "b"}