Search code examples
jsonregexdouble-quotes

Regex pattern to select boundary words excluding words inside double quotes


I give up! I have tried so many different things here and what little hair I have left Im losing so if someone could help me out Id be most grateful.

I have some badly formed Json:

friend_in_need: {
    id: 3
    
    possible: {
        is_ironman: yes
        difficulty: {">": 1}
        has_start_date: {"<": "1936-01-02"}
        has_any_custom_difficulty_setting: no
        game_rules_allow_achievements: yes
    }
    
    happened: {
        has_country_flag: "achievement has joined faction"
    }
}

... that I am trying to clean up in various steps using Regex replacement statements.

The step I am trying to do next is to match each key and string values so that I append double quotes around them, like the following:

"friend_in_need": {
    "id": 3
    
    "possible": {
        "is_ironman": "yes"
        "difficulty": {">": 1}
        "has_start_date": {"<": "1936-01-02"}
        "has_any_custom_difficulty_setting": "no"
        "game_rules_allow_achievements": "yes"
    }
    
    "happened": {
        "has_country_flag": "achievement has joined faction"
    }
}

I have tried various different methods having some success sorting the keys first however I cannot find a way to select the string values excluding those values already in quotes. Id be more than happy to do this in multiple steps if necessary.

For example, I know these parts get me closer..

(?:\".+?\") matches everything between the brackets \b([a-zA-Z0-9@_]+)\b matches boundary words \S might be better

But I cant combine the two to not match the 1st. I thought this would work, but it didn't:

(?!(?:\".+?\")))\b([a-zA-Z0-9@_]+)\b

Any help would be greatly appreciated.

Thanks in advance!!!


Solution

  • You can achieve this by using negative lookbehind and negative lookahead markers. if you used something along the lines of -

    (?<![\"\w])\w+(?![\"\w])
    

    It will match all word char groups that are not preceded by other word chars or " and that are not followed by other word chars or "

    you can replace the \w+ in the middle to better fit your use-case as needed