What is the optimal way to couple multiple regular expressions within a Clojure function? I believe the function would start out as such:
(defn foo [x]
(re-seq #"some means to combine multiple regex")
but am not clear if this is will work, or the efficiency of such a function. To provide an example of possible regex coupling, one might consider a function which searched for both domain names and IP. For domain names I'd use a regex as such:
(re-seq #"\b([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}\b" x)
and for IP:
(re-seq #"\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b")
Regexs already allow for alternation with the |
operator.
user=> (re-seq #"\d+" "123 foo 345 bar")
("123" "345")
user=> (re-seq #"[a-zA-Z]+" "123 foo 345 bar")
("foo" "bar")
user=> (re-seq #"\d+|[a-zA-Z]+" "123 foo 345 bar")
("123" "foo" "345" "bar")
You can programatically union the regex patterns if desired by interposing the |
operator.
(defn union-re-patterns [& patterns]
(re-pattern (apply str (interpose "|" (map #(str "(?:" % ")") patterns)))))
user=> (union-re-patterns #"\d+" #"[a-zA-Z]+")
#"(\d+)|([a-zA-Z]+)"
user=> (map first (re-seq (union-re-patterns #"\d+" #"[a-zA-Z]+") "123 foo 345 bar"))
("123" "foo" "345" "bar")