I need to extract data from a binary file.
I used binaryRecords
and get RDD[Array[Byte]]
.
From here I want to parse every record into
case class (Field1: Int, Filed2 : Short, Field3: Long)
How can I do this?
assuming you have no delimiter, an Int in Scala is 4 bytes, Short is 2 byte and long is 8 bytes. Assume that your Binary data was structured (for each line) as Int Short Long. You should be able to take the bytes and convert them to the classes you want.
import java.nio.ByteBuffer
val result = YourRDD.map(x=>(ByteBuffer.wrap(x.take(4)).getInt,
ByteBuffer.wrap(x.drop(4).take(2)).getShort,
ByteBuffer.wrap(x.drop(6)).getLong))
This uses a Java library to convert Bytes to Int/Short/Long, you can use other libraries if you want.