Search code examples
regexjsonregex-lookaroundsnegative-lookbehind

Can't make regular expression with negative lookaheads work to fix unescaped quotes in JSON


I have some JSON code in the following format:

[
  { "abc ": "d ef", "g": "h i", "jk lm no": "pq", "r st": "uvw xyz" },
  { "!1 2": " 3", "4 ": "5 6 7", " 8 ": "9 abc", "def": "hi "NAME" jk" },
  ...
]

I need to add backslashes in front of quotes in "NAME" to be able to parse this JSON correctly. So I need the above string to look like this:

[
  { "abc ": "d ef", "g": "h i", "jk lm no": "pq", "r st": "uvw xyz" },
  { "!1 2": " 3", "4 ": "5 6 7", " 8 ": "9 abc", "def": "hi \"NAME\" jk" },
  ...
]

I tried using regex to replace (?!({ |": |", ))"(?!( }|: "|, ")) with '\\\\"', but I get:

[
  { \"abc ": \"d ef", \"g": \"h i", \"jk lm no": \"pq", \"r st": \"uvw xyz" },
  { \"!1 2": \" 3", \"4 ": \"5 6 7", \" 8 ": \"9 abc", \"def": \"hi \"NAME\" jk" },
  ...
]

Please help to write a correct regular expression.


Solution

  • Try this regex:

    (?<![{,:] )"(?![:,]| })
    

    Description

    Regular expression visualization

    Demo

    http://regex101.com/r/tJ2dG0


    Discussion

    Firstly, I assume that your regex flavor supports lookbehind.

    Secondly, how did I find this regex you'd say ? Often, when you build a regular expression, you either build it for matching what you want or you build it for matching what you don't want. I use the latter here.

    This is the regex for matching valid double quotes:

    (?<=[{,:] )"|"(?=[:,]| })
    

    Regular expression visualization

    Demo: http://regex101.com/r/oX4uM5

    As you can see in the demo, the regex (let's call it R) doesn't capture invalid quotes. So the regex we are looking for is its (particular) opposite (ie !R). particular because we'll take the opposite of the look(behind|ahead) but not the quote inside R.

    So

    • (?<=...) becomes (?<!...)
    • (?=...) becomes (?!...)
    • "|" (read it " OR ") becomes (" AND ") simply "

    hence the final regex at the top of this answer.