Search code examples
android-studiokotlindata-conversion

My Kotlin code converts Latin alphabet characters to binary code but non-Latin alphabet characters(Amharic, Arabic etc) can't convert


Hello everyone my code converts Latin alphabet characters to binary but crashes when i try converting non-Latin alphabet characters. Can you help me so my code can convert every alphabet?

fun strToBinary(str: String): String {
    val builder = StringBuilder()

    for (c in str.toCharArray()) {
        val toString = c.code.toString(2) // get char value in binary
        builder.append(String.format("%08d", Integer.parseInt(toString))) // we complete to have 8 digits
    }

    return builder.toString()
}

When i try non-Latin characters it gives this exception.

Exception in thread "main" java.lang.NumberFormatException: For input string: "11000100011"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:583)
    at java.lang.Integer.parseInt(Integer.java:615)
    at BinaryKt.strToBinary(Binary.kt:6)
    at BinaryKt.main(Binary.kt:41)
    at BinaryKt.main(Binary.kt)

Solution

  • You wrongly assumed that a character always takes 1 byte. This is only true when using ASCII, but less common characters may take as much as even 4 bytes per char.

    I suggest first encoding the whole string into ByteArray using UTF-8 and then converting it byte by byte:

    fun strToBinary(str: String) = buildString {
        str.toByteArray().forEach {
            append(it.toUByte()
                .toString(2)
                .padStart(8, '0')
            )
        }
    }
    

    Note that such encoding takes quite a log of space. Resulting string is at least 8 times longer than the original text. You can use hex encoding to make it 4 times shorter:

    fun strToHex(str: String) = buildString {
        str.toByteArray().forEach {
            append(it.toUByte()
                .toString(16)
                .padStart(2, '0')
            )
        }
    }
    

    Or make it even shorter using base64 encoding:

    fun strToBase64(str: String) = Base64.getEncoder().encodeToString(str.toByteArray())
    

    Update

    To decode the string we basically need to reverse all steps. For example, for decoding from binary we need to chunk the string into 8-chars parts, decode each of them into a single byte, create byte array and then decode into string using UTF-8:

    fun binaryToStr(binary: String) =
        binary.chunked(8)
            .map { it.toUByte(2).toByte() }
            .toByteArray()
            .decodeToString()