I'm working on a personal market analysis project. I've got a data structure representing all the recent turning points in the market, that looks like this:
[{:high 1.121455, :time "2016-08-03T05:15:00.000000Z"}
{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}
{:high 1.12173, :time "2016-08-03T04:30:00.000000Z"}
{:high 1.121925, :time "2016-08-03T00:00:00.000000Z"}
{:high 1.12215, :time "2016-08-02T23:00:00.000000Z"}
{:high 1.12273, :time "2016-08-02T21:15:00.000000Z"}
{:high 1.12338, :time "2016-08-02T18:15:00.000000Z"}
{:low 1.119215, :time "2016-08-02T12:30:00.000000Z"}
{:low 1.118755, :time "2016-08-02T12:00:00.000000Z"}
{:low 1.117575, :time "2016-08-02T06:00:00.000000Z"}
{:low 1.117135, :time "2016-08-02T04:30:00.000000Z"}
{:low 1.11624, :time "2016-08-02T02:00:00.000000Z"}
{:low 1.115895, :time "2016-08-01T21:30:00.000000Z"}
{:low 1.11552, :time "2016-08-01T11:45:00.000000Z"}
{:low 1.11049, :time "2016-07-29T12:15:00.000000Z"}
{:low 1.108825, :time "2016-07-29T08:30:00.000000Z"}
{:low 1.10839, :time "2016-07-29T08:00:00.000000Z"}
{:low 1.10744, :time "2016-07-29T05:45:00.000000Z"}
{:low 1.10716, :time "2016-07-28T19:30:00.000000Z"}
{:low 1.10705, :time "2016-07-28T18:45:00.000000Z"}
{:low 1.106875, :time "2016-07-28T18:00:00.000000Z"}
{:low 1.10641, :time "2016-07-28T05:45:00.000000Z"}
{:low 1.10591, :time "2016-07-28T01:45:00.000000Z"}
{:low 1.10579, :time "2016-07-27T23:15:00.000000Z"}
{:low 1.105275, :time "2016-07-27T22:00:00.000000Z"}
{:low 1.096135, :time "2016-07-27T18:00:00.000000Z"}]
Conceptually, I want to match up :high
/:low
pairs, work out the price range (high-low) and midpoint (average of high & low), but I don't want every possible pair to be generated.
What I want to do is start from the 1st item in the collection {:high 1.121455, :time "2016-08-03T05:15:00.000000Z"}
and walk "down" through the remainder of the collection, creating a pair with every :low
item UNTIL I hit the next :high
item. Once I hit that next :high
item, I'm not interested in any further pairs. In this case, there's only a single pair created, which is the :high
and the 1st :low
- I stop there because the next (3rd) item is a :high
. The 1 generated record should look like {:price-range 0.000365, :midpoint 1.121272, :extremes [{:high 1.121455, :time "2016-08-03T05:15:00.000000Z"}{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}]}
Next, I'd move onto the 2nd item in the collection {:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}
and walk "down" through the remainder of the collection, creating a pair with every :high
item UNTIL I hit the next :low
item. In this case, I get 5 new records generated, being the :low
and the next 5 :high
items which are all consecutive; the first of these 5 records would look like
{:price-range 0.000064, :midpoint 1.12131, :extremes [{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}{:high 1.12173, :time "2016-08-03T04:30:00.000000Z"}]}
the second of these 5 records would look like
{:price-range 0.000835, :midpoint 1.1215075, :extremes [{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}{:high 1.121925, :time "2016-08-03T00:00:00.000000Z"}]}
and so on.
After that, I get a :low
so I stop there.
Then I'd move onto the 3rd item {:high 1.12173, :time "2016-08-03T04:30:00.000000Z"}
and walk "down" creating pairs with every :low
UNTIL I hit the next :high
. In this case, I get 0 pairs generated, because the :high
is followed immediately by another :high
. Same for the next 3 :high items, which are all followed immediately by another :high
Next I get to the 7th item {:high 1.12338, :time "2016-08-02T18:15:00.000000Z"}
and that should generate a pair with each of the following 20 :low
items.
My generated result would be a list of all the pairs created:
[{:price-range 0.000365, :midpoint 1.121272, :extremes [{:high 1.121455, :time "2016-08-03T05:15:00.000000Z"}{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}]}
{:price-range 0.000064, :midpoint 1.12131, :extremes [{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}{:high 1.12173, :time "2016-08-03T04:30:00.000000Z"}]}
...
If I was implementing this using something like Python, I'd probably use a couple of nested loops, use a break
to exit the inner loop when I stopped seeing :high
s to pair with my :low
and vice-versa, and accumulate all the generated records into an array as I traversed the 2 loops. I just can't work out a good way to attack it using Clojure...
Any ideas?
first of all you can rephrase this the following way:
:high
is followed by :low
, or vice versafor the simplicity let's use the following data model:
(def data0 [{:a 1} {:b 2} {:b 3} {:b 4} {:a 5} {:a 6} {:a 7}])
the first part can be achieved by using partition-by
function, that splits the input collection every time the function changes it's value for the processed item:
user> (def step1 (partition-by (comp boolean :a) data0))
#'user/step1
user> step1
(({:a 1}) ({:b 2} {:b 3} {:b 4}) ({:a 5} {:a 6} {:a 7}))
now you need to take every two of these groups and manipulate them. the groups should be like this: [({:a 1}) ({:b 2} {:b 3} {:b 4})] [({:b 2} {:b 3} {:b 4}) ({:a 5} {:a 6} {:a 7})]
this is achieved by the partition
function:
user> (def step2 (partition 2 1 step1))
#'user/step2
user> step2
((({:a 1}) ({:b 2} {:b 3} {:b 4}))
(({:b 2} {:b 3} {:b 4}) ({:a 5} {:a 6} {:a 7})))
you have to do something for every pair of groups. You could do it with map:
user> (def step3 (map (fn [[lbounds rbounds]]
(map #(vector (last lbounds) %)
rbounds))
step2))
#'user/step3
user> step3
(([{:a 1} {:b 2}] [{:a 1} {:b 3}] [{:a 1} {:b 4}])
([{:b 4} {:a 5}] [{:b 4} {:a 6}] [{:b 4} {:a 7}]))
but since you need the concatenated list, rather then the grouped one, you would want to use mapcat
instead of map
:
user> (def step3 (mapcat (fn [[lbounds rbounds]]
(map #(vector (last lbounds) %)
rbounds))
step2))
#'user/step3
user> step3
([{:a 1} {:b 2}]
[{:a 1} {:b 3}]
[{:a 1} {:b 4}]
[{:b 4} {:a 5}]
[{:b 4} {:a 6}]
[{:b 4} {:a 7}])
that's the result we want (it almost is, since we just generate vectors, instead of maps).
now you could prettify it with the threading macro:
(->> data0
(partition-by (comp boolean :a))
(partition 2 1)
(mapcat (fn [[lbounds rbounds]]
(map #(vector (last lbounds) %)
rbounds))))
which gives you exactly the same result.
applied to your data it would look almost the same (with another result generating fn)
user> (defn hi-or-lo [item]
(item :high (item :low)))
#'user/hi-or-lo
user>
(->> data
(partition-by (comp boolean :high))
(partition 2 1)
(mapcat (fn [[lbounds rbounds]]
(let [left-bound (last lbounds)
left-val (hi-or-lo left-bound)]
(map #(let [right-val (hi-or-lo %)
diff (Math/abs (- right-val left-val))]
{:extremes [left-bound %]
:price-range diff
:midpoint (+ (min right-val left-val)
(/ diff 2))})
rbounds))))
(clojure.pprint/pprint))
it prints the following:
({:extremes
[{:high 1.121455, :time "2016-08-03T05:15:00.000000Z"}
{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}],
:price-range 3.6500000000017074E-4,
:midpoint 1.1212725}
{:extremes
[{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}
{:high 1.12173, :time "2016-08-03T04:30:00.000000Z"}],
:price-range 6.399999999999739E-4,
:midpoint 1.12141}
{:extremes
[{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}
{:high 1.121925, :time "2016-08-03T00:00:00.000000Z"}],
:price-range 8.350000000001412E-4,
:midpoint 1.1215074999999999}
{:extremes
[{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}
{:high 1.12215, :time "2016-08-02T23:00:00.000000Z"}],
:price-range 0.001060000000000061,
:midpoint 1.12162}
{:extremes
[{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}
{:high 1.12273, :time "2016-08-02T21:15:00.000000Z"}],
:price-range 0.0016400000000000858,
:midpoint 1.12191}
{:extremes
[{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}
{:high 1.12338, :time "2016-08-02T18:15:00.000000Z"}],
:price-range 0.0022900000000001253,
:midpoint 1.1222349999999999}
{:extremes
[{:high 1.12338, :time "2016-08-02T18:15:00.000000Z"}
{:low 1.119215, :time "2016-08-02T12:30:00.000000Z"}],
:price-range 0.004164999999999974,
:midpoint 1.1212975}
{:extremes
[{:high 1.12338, :time "2016-08-02T18:15:00.000000Z"}
{:low 1.118755, :time "2016-08-02T12:00:00.000000Z"}],
:price-range 0.004625000000000101,
:midpoint 1.1210675}
...
As an answer the question about "complex data manipulation", i would advice you to look through all the collections' manipulating functions from the clojure core, and then try to decompose any task to the application of those. There are not so many cases when you need something beyond them.