Search code examples
clojure

How to filter map content by path


I want to select paths of a deeply nested map to keep.

For example:

{:a 1
 :b {:c [{:d 1 :e 1} 
         {:d 2 :e 2}]
     :f 1}
 :g {:h {:i 4 :j [1 2 3]}}}

I want to select by paths, like so:

(select-paths m [[:a] 
                 [:b :c :e]
                 [:b :f]
                 [:g :h :i]])

This would return

{:a 1
 :b {:c [{:e 1}
         {:e 2}]
     :f 1}
 :g {:h {:i 4}}}

Essentially the same as Elasticsearch's fields parameter. The format of the paths argument can be something else, this is just the first idea.

I tried two different solutions

  1. Go through the entire map and checking if the full path of the current element is in the given paths. I can't figure out how to handle lists of maps so that they are kept as lists of maps.
  2. Creating select-keys statements from the given paths but again I run into problems with lists of maps - and especially trying to resolve paths of varying depths that have some common depth.

I looked at spectre but I didn't see anything that would do this. Any map or postwalk based solution I come up with turns into something incredibly convoluted at some point. I must be thinking about this the wrong way.

If there's a way to do this with raw json, that would be fine as well. Or even a Java solution.


Solution

  • One way of solving this problem would be to generate a set of all subpaths that you accept and then write a recursive function that traverses the data structure and keeps track of the path to the current node. The code that accomplishes that does not need to be very long:

    (defn select-paths-from-set [current-path path-set data]
      (cond
        (map? data) (into {}
                          (remove nil?)
                          (for [[k v] data]
                            (let [p (conj current-path k)]
                              (if (contains? path-set p)
                                [k (select-paths-from-set p path-set v)]))))
        (sequential? data) (mapv (partial select-paths-from-set current-path path-set) data)
        :default data))
    
    (defn select-paths [data paths]
      (select-paths-from-set []
                             (into #{}
                                   (mapcat #(take-while seq (iterate butlast %)))
                                   paths)
                             data))
    
    (select-paths {:a 1
                   :b {:c [{:d 1 :e 1} 
                           {:d 2 :e 2}]
                       :f 1}
                   :g {:h {:i 4 :j [1 2 3]}}}
                  [[:a] 
                   [:b :c :e]
                   [:b :f]
                   [:g :h :i]])
    ;; => {:a 1, :b {:c [{:e 1} {:e 2}], :f 1}, :g {:h {:i 4}}}