java parsing interpreter s-expression livecoding

How to Match Parenthesis to Parse a S-Expression?

I am trying to create a function that does the following:

Assuming that the code input is "(a 1 2 (b 3 4 5 (c 6) |7) 8 9)" where the pipe | symbol is the position of the cursor,

the function returns: a String "b 3 4 5 (c 6) 7" representing the code that is in the scope of the cursor

an int 8 representing the start index of the string relative to the input

an int 30 representing the end index of the string relative to the input

I already have working code that returns exactly that. However, the problem lies in ignoring comments, while keeping track of context (e.g. String literals, my own literal delimiters, etc).

Here is the code which keeps track of context:

public static void applyContext(Context context, String s, String snext, String sprev) {
        if (s.equals("\"")) {
            if (context.context == Context.Contexts.MAIN) {
                context.context = Context.Contexts.STRING;
                context.stringDelimiterIsADoubleQuote = true;
            } else if (context.context == Context.Contexts.STRING && context.stringDelimiterIsADoubleQuote && !sprev.equals("\\"))
                context.context = Context.Contexts.MAIN;
        } else if (s.equals("\'")) {
            if (context.context == Context.Contexts.MAIN) {
                context.context = Context.Contexts.STRING;
                context.stringDelimiterIsADoubleQuote = false;
            } else if (context.context == Context.Contexts.STRING && !context.stringDelimiterIsADoubleQuote && !sprev.equals("\""))
                context.context = Context.Contexts.MAIN;
        } else if (s.equals("/") && snext.equals("/")) {
            if (context.context == Context.Contexts.MAIN)
                context.context = Context.Contexts.COMMENT;
        } else if (s.equals("\n")) {
            if(context.context == Context.Contexts.COMMENT)
                context.context = Context.Contexts.MAIN;
        }
        else if (s.equals("\\")) {
            if(context.context == Context.Contexts.MAIN)
                context.context = Context.Contexts.PATTERN;
            else if(context.context == Context.Contexts.PATTERN)
                context.context = Context.Contexts.MAIN;
        }
    }

Firstly, I'll be using the function above like so:

String sampleCode = "(a b "cdef" g \c4 bb2 eb4 g4v0.75\)";
Context c = new Context(Context.Contexts.MAIN);
for(int i = 0; i < sampleCode.length(); i++) {
    String s = String.valueOf(sampleCode.charAt(i));
    String snext = *nullcheck* ? String.valueOf(sampleCode.charAt(i + 1)) : "";
    String sprev = *nullcheck* ? String.valueOf(sampleCode.charAt(i - 1)) : "";
    applyContext(c, s, snext, sprev);
    if(c.context == blahlbah) doBlah();
}

Second, I'll be using this both forwards an backwards, as the current method of doing the function stated at the top of the description is (in pseudocode) this:

function returnCodeInScopeOfCursor(theWholeCode::String, cursorIndex::int) {
  var depthOfCodeAtCursorPosition::int = getDepth(theWholeCode, cursorIndex);
  Context c = new Context(getContextAt(theWholeCode, cursorIndex));

  var currDepth::int = depthOfCodeAtCursorPosition;
  var startIndex::int, endIndex::int;

  for(i = cursorIndex; i >= 0; i--) {//going backwards
    s = .....
    snext = ......
    sprev = ......
    applyContext(c, s, snext, sprev);

    if(c.context == Context.MAIN) {
       if s = "(" then currDepth --;
       if s = ")" then currDepth ++;
    }

    when currDepth < depthOfCodeAtCursorPosition
      startIndex = i + 1;
      break;
  }

  currDepth = depthOfCodeAtCursorPosition;//reset
  for(i = cursorIndex; i < theWholeCode.length; i++) {//going forwards
    s = ...
    snex......
    sprev.....
    applyContext(c, s, snext, sprev);

    if(c.context == Context.MAIN) {
      if s = "(" then currDepth ++;
      if s = ")" then currDepth --;
    }

    when currDepth < depthOfCodeAtCursorPosition
      endIndex = i - 1;
      break;
  }

  var returnedStr = theWholeCode->from startIndex->to endIndex

  return new IndexedCode(returnedStr, startIndex, endIndex);

As you can see, this function would work both forwards and in reverse. Or at least most of it. The only problem is that if I were to use this function backwards, the proper scanning of comments (denoted by the standard ECMA double slash "//") goes haywire.

If I were to create a separate function for reverse context application and check every line recursively for a double slash, then making everything after that '//' a COMMENT (or in the direction of the function's usage, everything before that //), it will take way too much processing time as I want to use this as a livecoding environment for music.

Also, removing the comments before trying to do that returnCodeInScopeOfCursor method may not be feasible... as I need to keep track of the indexes of the code and what not. If I were to remove the comments, there will be a big mess with all the code positions and keeping track of where did I remove what exactly and how many characters etc.... The text area input GUI I'm working with (RichTextFX) does not support Line-Char tracking, so everything is tracked using char index only, hence the problems...

So... I'm utterly perplexed as with what to do with my current code. Any help, suggestions, advice etc... will be greatly appreciated.

Solution

Could you pre-transform comments from // This is a comment<CR> to { This is a comment}<CR> you then have a language you can walk backwards and forwards.

Apply this transform on the way in and reverse it on the way out and all should be well. Notice we are replacing //... with {...} so all charaqcter offsets are retained.