Search code examples
javascriptregexvbscriptflat-fileecma262

maintain text but eliminate CR LF between tags


Fellow Regexers,

I have a flat file full of expressions like:

SELECT * FROM CONVENIENT_ONE_LINE_QUERY
"SELECT * FROM THIS_QUERY
WHERE IS_SPREAD_OVER == 123
ORDER BY MULTIPLE_LINES
HAVING AND_IS_BETWEEN_QUOTES"
SELECT * FROM ANOTHER_CONVENIENT_ONE_LINER

I want to eliminate the CRLF between the quotes and the quotes themselves, so that all my queries are convenient one-liners like that:

SELECT * FROM CONVENIENT_ONE_LINE_QUERY
SELECT * FROM THIS_QUERY WHERE IS_SPREAD_OVER == 123 ORDER BY MULTIPLE_LINES HAVING BUT_IS_BETWEEN_QUOTES
SELECT * FROM ANOTHER_CONVENIENT_ONE_LINER

Please post the RegEx flavor used in the solution. I'm using TextCrawler, which claims to be ECMA262 (same as VBScript/Javascript) and the closest I came to a solution is something like:

(\r\n".*)(.*)\r\n(.*"\r\n)

Forgive my n00biness. Best regards, Lynx Kepler


Solution

  • You could first remove all CRLFs if the next " is at the end of a line:

    result = subject.replace(/\r\n(?=[^"]*"$)/mg, " ");
    

    Explanation:

    \r\n    # Match a CRLF
    (?=     # if and only if
     [^"]*  # it is followed by any number of non-quote characters
     "      # and a quote
     $      # at the end of a line
    )       # End of lookahead.
    

    This transforms your example into

    SELECT * FROM CONVENIENT_ONE_LINE_QUERY
    "SELECT * FROM THIS_QUERY WHERE IS_SPREAD_OVER == 123 ORDER BY MULTIPLE_LINES HAVING AND_IS_BETWEEN_QUOTES"
    SELECT * FROM ANOTHER_CONVENIENT_ONE_LINER
    

    enter image description here

    Then, in a second step, remove the quotes:

    result = subject.replace(/^"|"$/mg, "");