Search code examples
regexclojure

Find the exact usage of a function name in a string by re-seq re-pattern - Clojure


My f function is not working as I expected. I would like to find exact matchings in the given string but my f function is working something weird:

Let me explain what I trying to do roughly, going to give a local repo path as args in a slurp function then read all of clj, cljs, and cljc files in the location, parse them into the string and then try to sum usage of Clojure core functions which are how many times used. I hope I explained clearly what I trying to do.

edit 1: I am going to try to implement the pure f function. edit 2: Sorry, I organized the question like MRE.

(defn f "searchs the given value(inside !type atom) inside given text(vector-0f-texts)"
  [regex text]
  (count (re-seq (re-pattern regex) text)))


(def function-text
  "str-path  str")

(f "str" function-text)
=> 2
;expected => 1

 

Solution

  • To make it clearer, I'm posting this as an answer, not just a comment.

    Given this definition:

    (defn f "searchs the given value(inside !type atom) inside given text(vector-0f-texts)"
      [pattern text]
      (count (re-seq (re-pattern pattern) text)))
    
    

    and using this text

    (def function-text 
      "(defn idx->meta [pair-col]
         (->> pair-col
              (apply hash-map)
              (reduce-kv (fn [acc k v]
                           (let [idx       k
                                 str-vals  (filterv string? (vals v))
                                 str-paths (->> str-vals
                                                (map #(clojure.string/split % #\"\")))]
                             (->> str-paths
                                  (reduce (fn [acc str-path]
                                            (update-in acc str-path (fnil conj []) v))
                                          acc))))
                         {})
              )
         )")
    
    

    You can do some experiments:

    (f "with-open" function-text)
    ;; => 0
    (f "int?" function-text)
    ;; => 3
    (f "int\\?" function-text)
    ;; => 0
    
    

    In particular, ? is a special character in regexes and it means "zero or one repetition" of the previous pattern. This is nothing specific to Clojure or Java, it's a standard regex behavior.

    UPDATE: exact matching of symbols Exact matching might be a bit tricky, but this could work, by excluding all the characters that are valid in Clojure symbols:

    (defn f
      "searchs the given value(inside !type atom) inside given text(vector-0f-texts)"
      [pattern text]
      ;; See https://clojure.org/reference/reader#_symbols for valid characters in Clojure symbols
      (count (re-seq (re-pattern (str "\\Q" pattern "\\E" "[^a-zA-Z0-9*+!\\-_'?]")) text)))
    

    It also uses "\Q" and "\E" to simulate Pattern/quote (see cfrick's comment).

    Btw. for a more serious analysis, I'd look at https://github.com/clj-kondo/clj-kondo I have some quick experiments here: https://github.com/jumarko/clojure-experiments/blob/develop/src/clojure_experiments/linters/clj_kondo.clj#L1