Search code examples
javaregexclojureclojure-java-interop

How smart is Clojure with regexp parsing/compiling?


Imagine, I've a Clojure function similar to that one

(defn ABC
  [tag v]
  (let [value (->> (str/split v #",")
                   (map str/trim)
                   (map u/capitalize)
                   (str/join ", "))]
...)))

That function could be called many times. Is the regexp #", " will be parsed only once (I hope so)? Any points to source code/proofs of that?

What if I have a second function with the same regexp. Would it be technically the same instance of regexp from the first function? Or it will be 2 independent regexps?

(defn XYZ [v]
  (let [adr (str/split v #",")]
        (if (> (count adr) 5)
...
)

Solution

  • For a function:

    (defn --xx-- [] (s/split "foo" #"bar"))
    

    clj-decompiler produces this java for it:

    import clojure.lang.*;
    import java.util.regex.*;
    
    public final class scheduler$__xx__ extends AFunction
    {
        public static final Var const__0;
        public static final Object const__1;
        
        public static Object invokeStatic() {
            return ((IFn)scheduler$__xx__.const__0.getRawRoot()).invoke("foo", scheduler$__xx__.const__1);
        }
        
        @Override
        public Object invoke() {
            return invokeStatic();
        }
        
        static {
            const__0 = RT.var("clojure.string", "split");
            const__1 = Pattern.compile("bar");
        }
    }
    

    As you can see it gets compiled into a Pattern in the static initializer block. So it is OK to write it as a literal.

    A second function will have its own Pattern object. If you want to share the instance you should do so yourself

    (def pat #"bar")