Search code examples
clojurespecifications

create spec from data


I am trying to create spec just from data. I have very complex data structure - all nested map.

{:contexts
 ({:importer.datamodel/global-id "01b4e69f86e5dd1d816e91da27edc08e",
   :importer.datamodel/type "province",
   :name "a1",
   :importer.datamodel/part-of "8cda1baed04b668a167d4ca28e3cef36"}
  {:importer.datamodel/global-id "8cda1baed04b668a167d4ca28e3cef36",
   :importer.datamodel/type "country",
   :name "AAA"}
  {:importer.datamodel/global-id "c78e5478e19f2d7c1b02088e53e8d8a4",
   :importer.datamodel/type "location",
   :importer.datamodel/center ["36." "2."],
   :importer.datamodel/part-of "01b4e69f86e5dd1d816e91da27edc08e"}
  {:importer.datamodel/global-id "88844f94f79c75acfcb957bb41386149",
   :importer.datamodel/type "organisation",
   :name "C"}
  {:importer.datamodel/global-id "102e96468e5d13058ab85c734aa4a949",
   :importer.datamodel/type "organisation",
   :name "A"}),
 :datasources
 ({:importer.datamodel/global-id "Source;ACLED",
   :name "ACLED",
   :url "https://www.acleddata.com"}),
 :iois
 ({:importer.datamodel/global-id "item-set;ACLED",
   :importer.datamodel/type "event",
   :datasource "Source;ACLED",
   :features
   ({:importer.datamodel/global-id
     "c74257292f584502f9be02c98829d9fda532a492e7dd41e06c31bbccc76a7ba0",
     :date "1997-01-04",
     :fulltext
     {:importer.datamodel/global-id "df5c7d6d075df3a7719ebdd39c6d4c7f",
      :text "bla"},
     :location-meanings
     ({:importer.datamodel/global-id
       "e5611219971164a15f06e07228fb7b51",
       :location "8cda1baed04b668a167d4ca28e3cef36",
       :contexts (),
       :importer.datamodel/type "position"}
      {:importer.datamodel/global-id
       "af36461d27ec1d8d28fd7f4a70ab7ce2",
       :location "c78e5478e19f2d7c1b02088e53e8d8a4",
       :contexts (),
       :importer.datamodel/type "position"}),
     :interaction-name "Violence",
     :importer.datamodel/type "description",
     :has-contexts
     ({:context "102e96468e5d13058ab85c734aa4a949",
       :context-association-type "actor",
       :context-association-name "actor-1",
       :priority "none"}
      {:context "88844f94f79c75acfcb957bb41386149",
       :context-association-type "actor",
       :context-association-name "actor-2",
       :priority "none"}),
     :facts
     ({:importer.datamodel/global-id
       "c46802ce6dcf33ca02ce113ffd9a855e",
       :importer.datamodel/type "integer",
       :name "fatalities",
       :value "16"}),
     :attributes
     ({:name "description",
       :importer.datamodel/type "string",
       :value "Violence"})}),
   :attributes (),
   :ioi-slice "per-item"})}

What tool can create the spec for such a structure? I am trying to use this tool: https://github.com/stathissideris/spec-provider

but it gives me this:

(spec/def :importer.datamodel/data
  (clojure.spec.alpha/coll-of
   (clojure.spec.alpha/or
    :collection
    (clojure.spec.alpha/coll-of
     (clojure.spec.alpha/keys
      :req
      [:importer.datamodel/global-id]
      :opt
      [:importer.datamodel/center
       :importer.datamodel/part-of
       :importer.datamodel/type]
      :opt-un
      [:importer.datamodel/attributes
       :importer.datamodel/datasource
       :importer.datamodel/features
       :importer.datamodel/ioi-slice
       :importer.datamodel/name
       :importer.datamodel/url]))
    :simple
    clojure.core/keyword?)))

which is not complete solution... I use (sp/pprint-specs (sp/infer-specs data :importer.datamodel/data) 'data 's)... What tool can create the spec for such a structure?


Solution

  • I am trying to use this tool: https://github.com/stathissideris/spec-provider

    spec-provider isn't giving you the desired result because your data is a complex nested/recursive structure. Some of those maps would be best spec'd with multi-specs, but spec-provider won't do that; one of the caveats in its docs says There is no attempt to infer multi-spec.

    The only way to properly spec some of these maps is using multi-specs their spec will depend on their :importer.datamodel/type value.

    First, let's look at the top-level keys (assuming the map is in a binding named data):

    (keys data) => (:contexts :datasources :iois)
    

    Create a s/keys spec for the outermost map:

    (s/def ::my-map
      (s/keys :req-un [::contexts ::datasources ::iois]))
    

    These keys are unqualified, but we must use qualified keywords w/:req-un to spec them. We can use the REPL to look at the shapes of nested maps and their relationships to :importer.datamodel/type, by walking the nested structure and collecting data:

    (let [keysets (atom #{})]
      (clojure.walk/postwalk
        (fn [v]
          (when (map? v)
            (swap! keysets conj [(:importer.datamodel/type v) (keys v)]))
          v)
        data)
      @keysets)
    =>
    #{...
      ["organisation" (:importer.datamodel/global-id :importer.datamodel/type :name)]
      [nil (:context :context-association-type :context-association-name :priority)]
      ["description"
       (:importer.datamodel/global-id :date :fulltext :location-meanings
        :interaction-name :importer.datamodel/type :has-contexts :facts :attributes)]
      ["event" (:importer.datamodel/global-id :importer.datamodel/type :datasource :features :attributes :ioi-slice)]
     ...}
    

    (An upcoming spec alpha should make it easier to define specs programmatically from this data.)

    Multi-specs

    We can see there are some map shapes that don't have a :importer.datamodel/type, but we can write multi-specs for the ones that do. First define a multimethod for dispatching on the type key:

    (defmulti type-spec :importer.datamodel/type)
    

    Then write a defmethod for each :importer.datamodel/type value. Here are a few examples:

    (defmethod type-spec :default [_] (s/keys))
    (defmethod type-spec "organisation" [_]
      (s/keys :req [:importer.datamodel/global-id]
              :req-un [::name]))
    (defmethod type-spec "description" [_]
      (s/keys :req [:importer.datamodel/global-id]
              :req-un [::date ::fulltext ::location-meanings ::interaction-name
                       ::has-contexts ::facts ::attributes]))
    (defmethod type-spec "event" [_]
      (s/keys :req-un [::features]))
    

    Then define the s/multi-spec:

    (s/def ::datamodel
      (s/multi-spec type-spec :importer.datamodel/type))
    

    Now any map we conform to ::datamodel will resolve a spec based on its :importer.datamodel/type value. We can assign that spec to keywords that spec will use to conform the maps, e.g. one of the outermost keys:

    (s/def ::contexts (s/coll-of ::datamodel))
    

    Now if you remove a required key from one of the maps we spec'd under :contexts, spec can tell you what's wrong. For example, removing the :name key from an "organisation" map:

    (s/explain ::my-map data)
    In: [:contexts 3]
    val: #:importer.datamodel{:global-id "88844f94f79c75acfcb957bb41386149",
                              :type "organisation"}
    fails spec: :playground.so/datamodel
    at: [:contexts "organisation"]
    predicate: (contains? % :name)
    

    Other specs

    For the maps that don't have a :importer.datamodel/type you should be able to define a key spec. For example, the nested :has-contexts key has a collection of maps without a :importer.datamodel/type, but if we can assume they'll all be similar we can write this spec:

    (s/def ::has-contexts
      (s/coll-of (s/keys :req-un [::context ::context-association-type
                                  ::context-association-name ::priority])))
    

    :has-contexts is in a map we've already covered with a multi-spec above, and simply registering a spec to this key will make spec conform its values. The outermost key that contains this spec is :iois so we can spec that key too:

    (s/def ::iois (s/coll-of ::datamodel))
    

    Now, conforming an input to ::my-map spec will automatically cover more data.

    What tool can create the spec for such a structure?

    As you can see, writing a full spec for this structure is non-trivial but possible. I don't know of any existing tool that could automatically infer a complete, "correct" spec for this structure. It would've had to intuit that :importer.datamodel/type is a key that could be used to dispatch to different s/keys specs — and it would still be making a potentially invalid assumption. I think tool-assisted spec generation is more realistic and practical in this case.