Search code examples
stringkotlinescaping

Kotlin converting "literal escapes" to escaped characters


I'm creating a parser and I need to convert strings with escaped characters to the characters themselves. Suppose I have a string "\n" (backslash and letter n), I want to have some algorithm to transform it to line break '\n'.

Straightforward way is to create a map where literal escaped strings will be matched with escaped characters, but this solution is bad if I want to convert unicode escapes (like, "\uFF00" to '\uFF00').

Is there any other way to solve this rather than creating a map?


Solution

  • You can do something like the following.

    1. Write a function that iterates over the string character by character. If you encounter a \ character, then try to parse an escape sequence beginning with that.
    data class CharacterLiteral(
        val value: Char?,
        val length: Int,
    )
    
    fun String.format(): String {
        val builder = StringBuilder()
    
        var i = 0
        while (i < this.length) {
            val ch = this[i]
            if (ch == '\\') {
                val characterLiteral = parseEscapedCharacter(i + 1)
                if (characterLiteral.value == null) {
                    i++
                } else {
                    builder.append(characterLiteral.value)
                    i += characterLiteral.length
                }
            } else {
                builder.append(ch)
                i++
            }
        }
    
        return builder.toString()
    }
    
    1. Parse the escaped character
    fun String.parseEscapedCharacter(index: Int): CharacterLiteral {
        return when (getOrNull(index)) {
            'u' -> parseUnicodeCharacter(index + 1)
            't' -> CharacterLiteral('\t', 2)
            'n' -> CharacterLiteral('\n', 2)
            '\'' -> CharacterLiteral('\'', 2)
            '"' -> CharacterLiteral('"', 2)
            '\\' -> CharacterLiteral('\\', 2)
            else -> CharacterLiteral(null, 0) // invalid case
        }
    }
    
    1. Parse the unicode character
    fun String.parseUnicodeCharacter(index: Int): CharacterLiteral {
        // unicode: \u0000 to \uFFFF
        for (i in 0..<4) {
            val ch = getOrNull(index + i)
            if (ch == null || (!ch.isDigit() && ch.lowercase() !in "abcdef")) {
                return CharacterLiteral(null, 0) // invalid
            }
        }
    
        // Source: https://stackoverflow.com/a/45273638/8822610
        val unicodeNumber = this.slice(index..(index + 3)).trimStart('0')
        val string = Character.toString(Integer.parseInt(unicodeNumber, 16))
        return CharacterLiteral(string[0], 6)
    }
    

    You can compare the result of the format function.

    fun main() {
        val s1 = "bla '\\u0065' '\\t' '\\n' '\\\\'"
        val s2 = "bla '\u0065' '\t' '\n' '\\'"
        println(s1.format() == s2) // true
    }