I'm creating a parser and I need to convert strings with escaped characters to the characters themselves. Suppose I have a string "\n"
(backslash and letter n), I want to have some algorithm to transform it to line break '\n'
.
Straightforward way is to create a map where literal escaped strings will be matched with escaped characters, but this solution is bad if I want to convert unicode escapes (like, "\uFF00"
to '\uFF00'
).
Is there any other way to solve this rather than creating a map?
You can do something like the following.
\
character, then try to parse an escape sequence beginning with that.data class CharacterLiteral(
val value: Char?,
val length: Int,
)
fun String.format(): String {
val builder = StringBuilder()
var i = 0
while (i < this.length) {
val ch = this[i]
if (ch == '\\') {
val characterLiteral = parseEscapedCharacter(i + 1)
if (characterLiteral.value == null) {
i++
} else {
builder.append(characterLiteral.value)
i += characterLiteral.length
}
} else {
builder.append(ch)
i++
}
}
return builder.toString()
}
fun String.parseEscapedCharacter(index: Int): CharacterLiteral {
return when (getOrNull(index)) {
'u' -> parseUnicodeCharacter(index + 1)
't' -> CharacterLiteral('\t', 2)
'n' -> CharacterLiteral('\n', 2)
'\'' -> CharacterLiteral('\'', 2)
'"' -> CharacterLiteral('"', 2)
'\\' -> CharacterLiteral('\\', 2)
else -> CharacterLiteral(null, 0) // invalid case
}
}
fun String.parseUnicodeCharacter(index: Int): CharacterLiteral {
// unicode: \u0000 to \uFFFF
for (i in 0..<4) {
val ch = getOrNull(index + i)
if (ch == null || (!ch.isDigit() && ch.lowercase() !in "abcdef")) {
return CharacterLiteral(null, 0) // invalid
}
}
// Source: https://stackoverflow.com/a/45273638/8822610
val unicodeNumber = this.slice(index..(index + 3)).trimStart('0')
val string = Character.toString(Integer.parseInt(unicodeNumber, 16))
return CharacterLiteral(string[0], 6)
}
You can compare the result of the format
function.
fun main() {
val s1 = "bla '\\u0065' '\\t' '\\n' '\\\\'"
val s2 = "bla '\u0065' '\t' '\n' '\\'"
println(s1.format() == s2) // true
}