Search code examples
clojure

Is there a generic way to consolidate a list of maps based on specific matching keys, while grouping the data of others?


I'm looking to nicely group some data from a list of maps, effectively attempting to "join" on given keys, and consolidating values associated with others. I have a (non-generic) example.

I work at a digital marketing company, and my particular example is regarding trying to consolidate a list of maps representing click-counts on our sites from different device types.

Example data is as follows

(def example-data
  [{:site-id "439", :pid "184", :device-type "PC", :clicks 1}
   {:site-id "439", :pid "184", :device-type "Tablet", :clicks 2}
   {:site-id "439", :pid "184", :device-type "Mobile", :clicks 4}
   {:site-id "439", :pid "3", :device-type "Mobile", :clicks 6}
   {:site-id "100", :pid "200", :device-type "PC", :clicks 3}
   {:site-id "100", :pid "200", :device-type "Mobile", :clicks 7}])

I want to "join" on the :site-id and :pid keys, while consolidating the :device-types and their corresponding :clicks into a map themselves: a working solution would result in the following list

[{:site-id "439", :pid "184", :device-types {"PC" 1, "Tablet" 2, "Mobile" 4}}
 {:site-id "439", :pid "3", :device-types {"Mobile" 6}} 
 {:site-id "100", :pid "200", :device-types {"PC" 3, "Mobile" 7}}]

So, I do have a working solution for this specific transformation, which is as follows:

(defn consolidate-click-counts [click-counts]
  (let [[ks vs] (->> click-counts 
                     (group-by (juxt :site-id :pid))
                     ((juxt keys vals)))
        consolidate #(reduce (fn [acc x]
                               (assoc acc (:device-type x) (:clicks x)))
                             {}
                             %)]
    (map (fn [[site-id pid] devs]
           {:site-id site-id :pid pid :device-types (consolidate devs)})
         ks
         vs)))

While this works for my immediate use, this solution feels a little clumsy to me, and is also strongly tied to this exact transformation, I've been trying to think of what a more generic version of this function would look like, where the keys to join/consolidate on were parameterized maybe? I think it would also be ideal to have some kind of resolving fn that could be provided for duplicate (not-joined-on) keys (e.g. if I had two maps with a matching :site-id, :pid, and :device-type, where I would then probably want to add the click-counts together), sort of like merge-with - but maybe that's too much)

I'm not sold on my grouping method either, perhaps it would be better to have another list of maps, a la

[{:site-id "439", 
  :pid "184", 
  :grouped-data [{:device-type "PC", :clicks 1}
                 {:device-type "Tablet", :clicks 2}
                 {:device-type "Mobile", :clicks 4}}]
 {:site-id "439", 
  :pid "3", 
  :grouped-data [{:device-type "Mobile", :clicks 6}}] 
 {:site-id "100", 
  :pid "200", 
  :grouped-data [{:device-type "PC", :clicks 3} 
                 {:device-type "Mobile", :clicks 7}}]

Solution

  • Your general approach is reasonable, but the combination of group-by and your custom assoc lambda passed to reduce is more easily replicated with merge-with merge, a common idiom for combining data from multiple maps with shared keys:

    (defn consolidate-click-counts [click-counts]
      (for [[k v] (apply merge-with merge 
                         (for [m click-counts]
                           {(select-keys m [:site-id :pid]) 
                            (select-keys m [:device-type :clicks])}))]
        (assoc k :device-types v))))
    

    Notice I also use a map {:site-id s :pid p} as the intermediate map key, rather than the vector [s p]. Both are fine, but this is easier to get to. It also avoids having to repeat the key names multiple times in the implementation.

    I've written this basic function many times; see How to merge duplicated keys in list in vectors in Clojure? for another recent example.

    You ask about combining multiple maps with the same keys, where you'd want to add together the click counts. That's not hard either: just tweak which part of the submap goes into the "key" section of the intermediate map, and change the merge function:

    (defn consolidate-click-counts [click-counts]
      (for [[k v] (apply merge-with + 
                         (for [m click-counts]
                           {(select-keys m [:site-id :pid :device-type]) 
                            (:clicks m)}))]
        (assoc k :clicks v)))
    

    And we can see this works fine to group duplicate keys:

    => (consolidate-click-counts (concat example-data example-data))
    ({:site-id "439", :pid "184", :device-type "PC", :clicks 2}
     {:site-id "439", :pid "184", :device-type "Tablet", :clicks 4} 
     {:site-id "439", :pid "184", :device-type "Mobile", :clicks 8} 
     {:site-id "439", :pid "3", :device-type "Mobile", :clicks 12}
     {:site-id "100", :pid "200", :device-type "PC", :clicks 6}
     {:site-id "100", :pid "200", :device-type "Mobile", :clicks 14})
    

    You asked about extracting a function that parameterizes over the variables in this algorithm. I don't really think it's worth doing, since the function is so small already, and easy enough to read - I'd rather just read another appply merge-with/for loop than remember some new function somebody wrote that abstracts it for me. But if you disagree, it is as easy as defining a new function with parameters for the stuff one could reasonably fiddle with:

    (defn map-combiner [group-keys output-key inspect combine]
      (fn [coll]
        (for [[k v] (apply merge-with combine 
                      (for [m coll]
                        {(select-keys m group-keys) 
                         (inspect m)}))]
          (assoc k output-key v))))
    
    (def consolidate-click-counts 
         (map-combiner [:site-id :pid :device-type] :device-types :clicks +))