Search code examples
clojureincanter

How can I create an Incanter series based on a range of values


I've got an Incanter dataset with 3 columns: a date/timestamp, a response time and a message size. What I'd like to do is create a scatter plot with the date/timestamp on the x axis and response times as the y axis.

This is easy enough, but I'd like to generate separate series of data based on the message size column. Incanter's scatter-plot function takes a :group-by option, but it appears to only handle discrete values. I'd like the series to be generated by applying some function to the message size column. Some function like:

(fn [n]
  (cond
    (< n 5000)                    "small"
    (and (>= n 5000) (< n 20000)) "medium"
    (>= n 20000)                  "large"))

Is this possible or is there a better way to accomplish the same thing?


Solution

  • you can synthesize a dataset with a new column with the discrete values calculated using your function, something like this....

    (def dataset1 (dataset 
                   [:x :y] 
                   (for [x (range 10) y (range 10)] [x y])))
    ;=> #'user/dataset1
    
    dataset1
    [:x :y]
    [0 0]
    [0 1]
    ...
    [9 8]
    [9 9]
    
    (def dataset2 (with-data dataset1 
      (conj-cols $data 
         (dataset [:size] ($map #(cond
                                  (< % 3)   "small"
                                  (<= 3 % 6) "medium"
                                  (< 6 %)   "large") :x)))))
    ;=> #'user/dataset2
    
    dataset2
    [:x :y :size]
    [0 0 "small"]
    [0 1 "small"]
    ...
    [9 8 "large"]
    [9 9 "large"]
    

    add then use the :group-by on the discrete value you've generated...

    (with-data dataset2 
       (view 
          (scatter-plot 
           :x 
           :y 
           :group-by :size )))
    

    To give something like this:

    incanter plot

    A variant which generates the group-by from two columns:

     (def dataset3 
      (with-data dataset1  
        (conj-cols 
          $data 
          (dataset [:size] ($map #(let [sum (+ % %2)] 
                                   (cond
                                     (< sum 4 )    "small"
                                     (<= 4 sum 12) "medium"
                                     (> 12 sum )   "large")) [:x :y])))))
    

    Which plots like this:

    sum plot