Search code examples
javascriptarraystokenize

Javascript, split a string in 4 pieces, and leave the rest as one big piece


I'm building a Javascript chat bot for something, and I ran into an issue:
I use string.split() to tokenize my input like this:
tokens = message.split(" ");

Now my problem is that I need 4 tokens to make the command, and 1 token to have a message. when I do this: !finbot msg testuser 12345 Hello sir, this is a test message

these are the tokens I get: ["!finbot", "msg", "testuser", "12345", "Hello", "sir,", "this", "is", "a", "test", "message"]

However, how can I make it that it will be like this: ["!finbot", "msg", "testuser", "12345", "Hello sir, this is a test message"]

The reason I want it like this is because the first token (token[0]) is the call, the second (token[1]) is the command, the third (token[2]) is the user, the fourth (token[3]) is the password (as it's a password protected message thing... just for fun) and the fifth (token[4]) is the actual message.
Right now, it would just send Hello because I only use the 5th token.
the reason why I can't just go like message = token[4] + token[5]; etc. is because messages are not always exactly 3 words, or not exactly 4 words etc.

I hope I gave enough information for you to help me. If you guys know the answer (or know a better way to do this) please tell me so.

Thanks!


Solution

  • You could revert to regexp given that you defined your format as "4 tokens of not-space separated with spaces followed by message":

    function tokenize(msg) {
        return (/^(\S+) (\S+) (\S+) (\S+) (.*)$/.exec(msg) || []).slice(1, 6);
    }
    

    This has the perhaps unwanted behaviour of returning an empty array if your msg does not actually match the spec. Remove the ... || [] and handle accordingly, if that's not acceptable. The amount of tokens is also fixed to 4 + the required message. For a more generic approach you could:

    function tokenizer(msg, nTokens) {
        var token = /(\S+)\s*/g, tokens = [], match;
    
        while (nTokens && (match = token.exec(msg))) {
            tokens.push(match[1]);
            nTokens -= 1; // or nTokens--, whichever is your style
        }
    
        if (nTokens) {
            // exec() returned null, could not match enough tokens
            throw new Error('EOL when reading tokens');
        }
    
        tokens.push(msg.slice(token.lastIndex));
        return tokens;
    }
    

    This uses the global feature of regexp objects in Javascript to test against the same string repeatedly and uses the lastIndex property to slice after the last matched token for the rest.

    Given

    var msg = '!finbot msg testuser 12345 Hello sir, this is a test message';
    

    then

    > tokenizer(msg, 4)
    [ '!finbot',
      'msg',
      'testuser',
      '12345',
      'Hello sir, this is a test message' ]
    > tokenizer(msg, 3)
    [ '!finbot',
      'msg',
      'testuser',
      '12345 Hello sir, this is a test message' ]
    > tokenizer(msg, 2)
    [ '!finbot',
      'msg',
      'testuser 12345 Hello sir, this is a test message' ]
    

    Note that an empty string will always be appended to returned array, even if the given message string contains only tokens:

    > tokenizer('asdf', 1)
    [ 'asdf', '' ]  // An empty "message" at the end