Search code examples
javaregexclojure

Regex: match all hyphens and replace by spaces for words that contains only letters AND that are NOT inside quotes


This regex: \b([A-z*]+)-(?=[A-z*]+\b)

with this replacement: $1 

Applied on:

Jean-Pierre bought "blue-green-red" product-2345 and other blue-red stuff.

Gives me:

Jean Pierre bought "blue green red" product-2345 and other blue red stuff.

While I want:

Jean Pierre bought "blue-green-red" product-2345 and other blue red stuff.

https://regex101.com/r/SJzAaP/1

EDIT:

I am using Clojure (Java)

EDIT 2:

yellow-black-white -> yellow black white

product_a-b -> product_a-b

EDIT 3: Accepted answer translated in Clojure

(clojure.string/replace
 "Jean-Pierre bought \"blue-green-red\" product-2345 and other blue-red-green stuff yellow-black-white product_a-b"
 #"(\"[^\"]*\")|\b([a-zA-Z]+)-(?=[a-zA-Z]+\b)"
 (fn [[s1 s2 s3]] (if s2 s1 (str s3 " "))))

;;=> "Jean Pierre bought \"blue-green-red\" product-2345 and other blue red green stuff yellow black white product_a-b"

Solution

  • In Java, you may use something like

    String s = "Jean-Pierre bought \"blue-green-red\" product-2345 and other blue-red stuff. yellow-black-white. product_a-b";
    StringBuffer result = new StringBuffer();
    Matcher m = Pattern.compile("(\"[^\"]*\")|\\b([a-zA-Z]+)-(?=[a-zA-Z]+\\b)").matcher(s);
    while (m.find()) {
        if (m.group(1) != null) {
            m.appendReplacement(result, m.group(0));
        } else {
            m.appendReplacement(result, m.group(2) + " ");
        }
    }
    m.appendTail(result);
    System.out.println(result.toString());
    // => Jean Pierre bought "blue-green-red" product-2345 and other blue red stuff. yellow black white. product_a-b
    

    See the Java demo.

    The regex is

    ("[^"]*")|\b([a-zA-Z]+)-(?=[a-zA-Z]+\b)
    

    Details

    • ("[^"]*") - Group 1: ", 0+ chars other than " and "
    • | - or
    • \b - word boundary -([a-zA-Z]+) - Group 2: 1+ letters (may be replaced with (\p{L}+) to match any letter)
    • - - a hyphen
    • (?=[a-zA-Z]+\b) - a positive lookahead that, immediately to the right of the current location, requires 1+ letters and a word boundary.

    If Group 1 matches (if (m.group(1) != null)) you just paste the match back into the result. If not, paste back Group 2 value and a space.

    Adding code here from the question, too, for better visibility:

    (def s "Jean-Pierre bought \"blue-green-red\" product-2345 and other blue-red stuff. yellow-black-white. product_a-b"
    
    (defn append [[g1 g2 g3]] (if g2 g1 (str g3 " ")))
    
    (clojure.string/replace s #"(\"[^\"]*\")|\b([a-zA-Z]+)-(?=[a-zA-Z]+\b)" append)
    
    ;;=> "Jean Pierre bought \"blue-green-red\" product-2345 and other blue red stuff. yellow black white. product_a-b"