Search code examples

spark binary (byte array) to get bytes as string

I have a case where I have below data frame

`scala> res1.printSchema
 |-- REC: binary (nullable = true)

|REC                                                                                                                                                                                                                                                                                                                                                                                                    |
|[75 00 01 00 4C 12 10]|

Now my requirement is to get a string of "75 00 01 00 4C 12 10" from the binary type. Please help.

I tried get mkString(" ") but it seem to be converting to standard asci but I want the literally binaries as a string as "75 00 01 00 4C 12 10"


  • Possibly you just want hex, but if you really want the spaces then:

    val df = sparkSession.sql("select cast('i am text' as binary) bytes")
    val castedToString = df.selectExpr("cast(bytes as string) casted")
    val hexed = df.selectExpr("hex(bytes) hexString")
    val prettyString = hexed.selectExpr("rtrim(regexp_replace(hexString,'(.{2})', '$1 ')) perToPrettyString")


    |               bytes|
    |[69 20 61 6D 20 7...|
    |   casted|
    |i am text|
    |         hexString|
    |   perToPrettyString|
    |69 20 61 6D 20 74...|

    The last expression replaces every two characters by themselves and an additional space, then removes the last trailing space.


    When Spark performs ".show" each column is forced through a new internal expression ToPrettyString which, by the inherited from ToStringBase default, translates binary into hex (via SparkStringUtils.getHexString), wrapping in square brackets."%02X".format(_)).mkString("[", " ", "]")

    Cast overrides this default:

      override protected def useHexFormatForBinary: Boolean = false

    The hex sql function calls Hex.hex:

      def hex(bytes: Array[Byte]): UTF8String = {
        val length = bytes.length
        val value = new Array[Byte](length * 2)
        var i = 0
        while (i < length) {
          value(i * 2) = Hex.hexDigits((bytes(i) & 0xF0) >> 4)
          value(i * 2 + 1) = Hex.hexDigits(bytes(i) & 0x0F)
          i += 1

    You could call also this function with a udf and probably should if performance is important instead of using a regex.