Search code examples
javaregexregex-lookarounds

Regex in Java - Extract String between certain symbols


I know that this type of question has been asked several times (For example here or here), but it appears that my problem is different since I can't find a solution.

Suppose I'm given this String:

key=false hotel = trivago foo cool='tr ue' feels="good"

(Be wary that each whitespace is put there on purpose)

I'm meant to extract each pair of values, so e.g. key=false is one of them. However, if a word has no "=" after some optional whitespaces, I'm meant to return word = null. Otherwise, this is the relevant part, if the word is between either the symbols ' or ", I should save whatever is between those symbols. An example to explain what I mean: The above example is meant to return this map:

{key=false, hotel=trivago, foo=null, cool=tr ue, feels=good}

I've tried all sorts of patterns for my problem above. The closest I feel like I've got in terms of what I want is this: ([a-zA-Z0-9]+)\s*[= ]+(?![^\"']*[\"'])

The idea is: Look for a word with numbers in it ([a-zA-Z0-9]+), followed by an optional amount of whitespaces. Then look for a "=". This part isn't the real issue (I think at least...) The issue is my desired group(2): To consider examples like "stuff" or 'wo wsers', I looked up the links above and considered using a negative lookahead.

And I think that is exactly what I need: group(2) should contain whatever follows after the = symbol. If there is a ", add whatever is in the String until we reach the next "; same deal for '. However, if there is none of the mentioned symbols, stop at the next whitespace.

I've been trying for hours but I don't know any further. Can anyone help me? If you have any more questions, feel free to ask!

Edit: Since I was asked to provide a few more examples, here goes:

example with a lot of words. Makes=sense.

Should return

{example=null, with=null, a=null, lot=null, of=null, words.=null, Makes=sense.}

Another example:

Did=you know that=I like=           "cod ing"?

Should return

{Did=you, know=null, that=I, like=cod ing?}

Solution

  • We could use the following regex match all approach:

    String input = "key=false hotel    =       trivago foo       cool='tr  ue' feels=\"good\"";
    Map<String, String> map = new HashMap<>();
    String regex = "(\\S+)\\s*=\\s*(?:'(.*?)'|\"(.*?)\"|(\\S+))";
    Pattern r = Pattern.compile(regex);
    Matcher m = r.matcher(input);
    while (m.find()) {
        String value = m.group(2);
        if (value == null) {
            value = m.group(3) == null ? m.group(4) : m.group(3);
        }
        map.put(m.group(1), value);
    }
    
    System.out.println(map);
    
    // {feels=good, cool=tr  ue, hotel=trivago, key=false}
    

    The regex pattern used here eagerly tries to find a value in single or double quotes. That failing, it defines a value as any continuous group of non whitespace characters (\S+).