Search code examples
javascriptnode.jsregexfsend-of-line

Regex working in debugger, but not in JavaScript


I want to get all the content in a text file before the first empty line.

I've found a working regex, but when I try to accomplish the same in Javascript it doesn't work.

(loading the file's contents is working)

async function readDir() {
    return new Promise((resolve,reject) => {
        fs.readdir('./content', (err, files) => {
            if(err) { reject(err) }
            resolve(files)
        });
    });
}

readDir().then((files) => {
    files.forEach(file => {
        var filepath = path.resolve('./content/'+file)
        if(filepath.endsWith('.txt')) {
            if(fs.statSync(filepath)["size"] > 0) {
                let data = fs.readFileSync(filepath).toString();
                let reg = /^[\s\S]*?(?=\n{2,})/;
                console.log(data.match(reg)) //returns null
            }
        }
    });
})

EDIT:

As O. Jones pointed out, the problem lies with the line endings. My regex was not picking up on \r\n line endings present in my file.

For now, this one seems to do the job: /^[\s\S]*?(?=(\r\n\r\n?|\n\n))/m


Solution

  • It looks like you want to match your re to the whole, multiline, contents of your file. You need the multiline flag to do that.

    Try this

    let reg = /^[\s\S]*?(?=\n{2,})/m;
    

    Notice the m after the re's closing /. For more explanation see the section called Advanced Searching With Flags here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions

    Also, it's possible you have line-ending trouble. Linux/ FreeBSD/ UNIX systems use \n aka newline to mark the end of each line. Macs use \r aka return for that. And Windows uses \r\n, two characters at the end of each line. Yeah, we all know what a pain in the xxx neck this is.

    So your blank line detector is probably too simple. Regular Expression to match cross platform newline characters Try using this to match cross-os ends of lines

    \r\n?|\n
    

    meaning either a return followed by an optional newline, or just a newline.

    It might look something like this.

    let reg = /^[\s\S]*?(?=(\r\n?|\n)(\r\n?|\n))/m;
    

    That looks for two of those end of line patterns in a row (not tested by me, sorry).