Search code examples
clojureapache-sparkflambo

Results of updating a vector in clojure


I have a function in developed in clojure using flambo spark api functions

(:require [flambo.api :as f]
          [clojure.string :as s])

(defn get-distinct-column-val
 "input = {:col val}"
 [ xctx input ]
 (let [{:keys [ col ]} input
       column-values []
       result (f/map (:rdd xctx) (f/fn [row]
                                   (if (s/blank? (get row col))
                                       nil
                                       (assoc column-values (get row col)))))]
   (println "Col values: " column-values)
   (distinct column-values)))

And I try to print out the values of column-values and I'm getting

 Col values:  []

Is there any reason as to why this is so?

I tried replacing the println in the above function with this:

(println "Result: " result)

and got the following:

 #<JavaRDD MapPartitionsRDD[16] at map at NativeMethodAccessorImpl.java:-2>

Thanks!


Solution

  • Nothing in your code alters the column-values binding. I'm not sure how flambo works in the specifics here, but you should be looking at result, not column-values.

    assoc takes two arguments - an associative collection and a position. I suspect that here you actually want conj. Neither assoc or conj alters the collection it is provided - we are using immutable data types here.

    I expect that accessing result won't yet have the answer you expect, because you expect assoc to build up a value across each call (differently from the result from f/map). In this case, you probably want reduce.