I need to publish a Google Cloud Pub/Sub message with multiple jpeg images. It needs to go in the data body. Putting it as a base64-encoded string in an attribute won't work, because attribute values are limited to 1024 bytes: https://cloud.google.com/pubsub/quotas#resource_limits
What's a simple and reliable pattern for doing that? It might seem possible to choose some fixed delimiter, but I want to avoid the possibility of that delimiter occurring inside an image. Is it possible something like ||||
might occur in a jpeg byte array? Another possibility might seem to encode as multi-part mime, but I haven't found any general-purpose non-http libraries to do that. I need implementations in both Java/Scala and Python. Or maybe can I just concatenate the jpeg byte arrays without any external delimiter, and split them based on header identifiers?
It looks like the following approach may work, written in Scala, using only natural delimiters:
def serializeJpegs(jpegs: Seq[Array[Byte]]): Array[Byte] =
jpegs.foldLeft(Array.empty[Byte])(_ ++ _)
def deserializeJpegs(bytes: Array[Byte]): Seq[Array[Byte]] = {
val JpegHeader = Array(0xFF.toByte, 0xD8.toByte)
val JpegFooter = Array(0xFF.toByte, 0xD9.toByte)
val Delimiter = JpegFooter ++ JpegHeader
val jpegs: mutable.Buffer[Array[Byte]] = mutable.Buffer.empty
var (start, end) = (0, 0)
end = bytes.indexOfSlice(Delimiter, start) + JpegFooter.length
while (end > JpegFooter.length) {
jpegs += bytes.slice(start, end)
start = end
end = bytes.indexOfSlice(Delimiter, start) + JpegFooter.length
}
if (start < bytes.length) {
jpegs += bytes.drop(start)
}
jpegs
}
I'm sure there's a more efficient and functional implementation, but that's for another day!