I am creating a tokenization system in Kotlin / JVM that takes in a file and returns each char or sequence of chars as a token. For some reason, whenever I tokenized a string, it finds the second instance of s "string" token before moving onto the next token, or in other words, the tokens are not in order. I think it might have to do with the loop, but I just can't figure it out. I am still learning Kotlin, so if anyone could give me pointers as well, that'd be great! Much appreciated any help.
output of tokens :
[["chello", string], ["tomo", string], [:, keyV], ["hunna", string], ["moobes", string], ["hunna", string]]
My file looks like this.
STORE "chello" : "tomo" as 1235312
SEND "hunna" in Hollo
GET "moobes"
GET "hunna"
fun tokenCreator (file: BufferedReader) {
var lexicon : String = file.readText()
val numRegex = Regex("^[1-9]\\d*(\\.\\d+)?\$")
val dataRegex = Regex("[(){}]")
val token = mutableListOf<List<Any>>()
for((index, char) in lexicon.withIndex()) {
println(char)
when {
char.isWhitespace() -> continue
char.toString() == ":" -> token.add(listOf(char.toString(), "keyV") )
char.toString().matches(Regex("[()]")) -> token.add(listOf(char, "group") )
char.toString().matches(dataRegex) -> token.add(listOf(char, "data_group" ) )
char == '>' -> token.add(listOf(char.toString(), "verbline") )
char == '"' -> {
var stringOf = ""
val firstQuote = lexicon.indexOf(char)
val secondQuote = lexicon.indexOf(char, firstQuote + 1)
if(firstQuote == -1 || secondQuote == -1) {
break
}
for(i in firstQuote..secondQuote) {
stringOf += lexicon[i]
}
lexicon = lexicon.substring(secondQuote + 1, lexicon.length)
token.add(listOf(stringOf, "string"))
}
}
}
println(token)
}
Changing the content while iterating seems like a recipe for confusion...
And you don't seem to increment the index to skip over consumed content. I'd recommend to change the loop in a way that allows you to skip over content you have consumed
I'd also remove this line:
lexicon = lexicon.substring(secondQuote + 1, lexicon.length)
Then replace
val firstQuote = lexicon.indexOf(char)
with
val firstQuote = index
You can also use substring instead of iteration for stringOf
val stringOf = lexicon.substring(
Moreover, using toString to check for ':' seems inefficient