Search code examples
stringclojureclojurescriptset-theory

How to mimic something like "set intersection" and "set union" with Clojure(Script) strings?


Suppose S and T are strings defined respectively as follows:

;; S
A
B
C

;; T
B
C
D

Are there analogous clojure(script) operations string-intersection and string-union (for lack of a better name) that satisfy the following?

(string-intersection S T)
;; => 
;; B
;; C

and

(string-union S T)
;; => 
;; A
;; B
;; C
;; D

As you can see, string-intersection would eliminate (on a line-by-line basis) non-matching lines (leaving only the lines that match), while string-union has the effect of combining lines and ignoring duplicates.

NOTE: I'm using clojurescript, but I imagine the answer would generalize to clojure as well.


Solution

  • From your description it seems you would like to treat your strings as a set of lines and calculate set intersection and union.

    For working with sets, you can use clojure.set namespace.

    First convert your strings into a set of lines:

    (require '[clojure.string :as str]
             '[clojure.set :as set])
    
    (def s "A\nB\nC")
    (def t "B\nC\nD")
    
    (def s-set (into #{} (str/split-lines s)))
    (def t-set (into #{} (str/split-lines t)))
    

    Then you can calculate your union and intersection:

    (def s-t-union (set/union s-set t-set))
    ;; => #{"C" "B" "A" "D"}
    
    (def s-t-intersection (set/intersection s-set t-set))
    ;; => #{"C" "B"}
    

    And sort it:

    (def s-t-union-sorted (sort s-t-union))
    ;; => ("A" "B" "C" "D")
    
    (def s-t-intersection-sorted (sort s-t-intersection))
    ;; => ("B" "C")
    

    You can also convert it back to String of lines:

    (str/join "\n" s-t-union-sorted)
    ;; => "A\nB\nC\D"
    
    (str/join "\n" s-t-intersection-sorted)
    ;; => "B\nC"