Search code examples
javascriptregexecmascript-5

What JavaScript regex do I need to distinguish between a comment and a file path?


I need a regex. It need not be complex but must cover all bases. Requirements follow:

The file pattern I am forced to follow is such that a file looks like this:

chest/.setup.js
chest/**/*-chest.js
--choppers something:hello-there/wasup
--respite  spoc
--chow ./chest//test.bootstrap#1
--chow ./chest/server.bootstrap#
--blow 200

I must support thousands of other files that have a similar look.

I want to support comments in these files using just one of // or #.

My code needs to rip out the comments from the file contents before processing using regex matching.

I haven't decided which comment syntax to use yet (please no answers that lump both together since only one will be used).

A comment may be at the start of a line (commenting out the whole line) or on the same line at the end (commenting out everything after it).

File paths may be "naive" and contain double slashes like so ... /path//to/file/example.js

Also remember that # is a valid filename character and filenames can contain spaces on some operating systems.

My questions are:

(1) What regex is needed to rip out comments if I use the // syntax?

(2) What regex is needed to rip out comments if I use the # syntax?

Please feel free to answer (1) or (2) or both individually but not together.

If there are any other considerations that you feel I should take into account, please advise. Answers preferred in ES5 syntax (an annoying restriction).


Solution

  • To avoid lines that start with -- you can use this regex pattern.

    var noDoubleDashComments = /^(?!\s*--).*$/gm;
    

    ^ : Matches the beginning of the string, or the beginning of a line if the multiline flag (m) is enabled. This matches a position, not a character.

    (?!\s*--) : a negative lookahead to avoids lines that start with 0 or more whitespaces followed by 2 daches

    .*$ : any character till the end of the line

    Flags

    g : global search. To find all occurences instead of only the first.

    m : When the multiline flag is enabled, beginning and end anchors (^ and $) will match the start and end of a line, instead of the start and end of the whole string.

    Other comment styles :

    var noDoubleForwardSlashComments = /^(?!\s*\/{2}).*$/gm;
    
    var noHashComments = /^(?!\s*#).*$/gm;
    
    var noHashOrDashOrSlashComments = /^(?!\s*(?:\/\/|--|#)).*$/gm;
    

    And if there are lines that have text followed by a double dash comment?
    For example:

    it's over 9000 -- DBZ reference
    

    You could use something like below to get only text before the comment or till the end of the line:

    var noDoubleDashCommentsAtAll = /^(?!\s*--).+?(?=\s*--|$)/gm;