Search code examples
javascriptparsingcompiler-construction

Find if a string line is nested or is a child of another line


I'm looking to write a small parser for some kind of files, and one of the things I have to accomplish is to find if a line is inside another one, defining this with indentation (spaces or tabs).

Example:

This is the main line
    This is a nested or child line

I'm trying to establish this by reading the first character position in the line and comparing it with the previous one with something like this:

var str = '      hello';
str.indexOf(str.match(/\S|$/).shift());

I'm sure this is not the best way and it looks horrible, also I have another issues to address, like checking if the indentation is made by spaces (2 or 4), or tabs, or passing/maintaining an state of the previous line (object).

Also, lines can be infinitely nested and of course I'm looking more for a nice and performant algorithm (or idea), or pattern rather than a simple check that I think is relatively easy to do but error prone. I'm sure it is already solve by people who works with parsers and compilers.


Edit:

str.search(/\S/);

@Oriol proposal looks much better


Solution

  • This is generally the kind of thing you write a parser for, rather than purely relying on regex. If the nesting determines the depth, then you have two things to solve: 1) find the depth for an arbitrary line, and 2) iterate through the set of lines and track, for each line, which preceding line has a lower depth value.

    The first is trivial if you are familiar with the RegExp functions in Javascript:

    function getDepth(line) {
      // find leading white space
      var ws = str.match(/^(\s+)/);
      // no leading white space?
      if (ws === null) return 0;
      // leading white space -> count the white space symbols.
      // obviously this goes wrong for mixed spaces and tabs, and that's on you.
      return ws[0].split('').length;
    }
    

    The second part is less trivial, and so you have several options. You could iterate through all the lines, and track the list of line numbers, pushing onto the list as you go deeper and popping from the list as you go back up, or you can build a simple tree structure (which is generally far better because it lets you expand its functionality much more easily) using standard tree building approached.

    function buildTree(lines, depths) {
      if (!depths) {
        var depths = lines.map(e => getDepth);
        return buildTree(lines, depths);
      }
      var root = new Node();
      for(var pos=0, end=lines.length; pos<end; pos++) {
        var line = lines[pos];
        var depth = depths[pos];
        root.insert(line, depth);  
      }
    }
    

    With a simple Node object, of course

    var Node = function(text, depth) {
      this.children = [];
      this.line = text.replace(/^\s+/,'');
      this.depth = depth;
    }
    
    Node.prototype = {
      insert: function(text, depth) {
        // this is where you become responsible: we need to insert
        // a new node inside of this one if the depths indicate that
        // is what should happen, but you're on the hook for determining
        // what you want to have happen if the indentation was weird, like
        // a line at depth 12 after a line at depth 2, or vice versa.
      }
    }