Search code examples
multipartform-datamultipartgoogle-cloud-pubsub

What's the simplest reliable way to encode multiple jpeg images in a single byte string?


I need to publish a Google Cloud Pub/Sub message with multiple jpeg images. It needs to go in the data body. Putting it as a base64-encoded string in an attribute won't work, because attribute values are limited to 1024 bytes: https://cloud.google.com/pubsub/quotas#resource_limits

What's a simple and reliable pattern for doing that? It might seem possible to choose some fixed delimiter, but I want to avoid the possibility of that delimiter occurring inside an image. Is it possible something like |||| might occur in a jpeg byte array? Another possibility might seem to encode as multi-part mime, but I haven't found any general-purpose non-http libraries to do that. I need implementations in both Java/Scala and Python. Or maybe can I just concatenate the jpeg byte arrays without any external delimiter, and split them based on header identifiers?


Solution

  • It looks like the following approach may work, written in Scala, using only natural delimiters:

      def serializeJpegs(jpegs: Seq[Array[Byte]]): Array[Byte] =
        jpegs.foldLeft(Array.empty[Byte])(_ ++ _)
    
      def deserializeJpegs(bytes: Array[Byte]): Seq[Array[Byte]] = {
        val JpegHeader = Array(0xFF.toByte, 0xD8.toByte)
        val JpegFooter = Array(0xFF.toByte, 0xD9.toByte)
        val Delimiter = JpegFooter ++ JpegHeader
    
        val jpegs: mutable.Buffer[Array[Byte]] = mutable.Buffer.empty
        var (start, end) = (0, 0)
        end = bytes.indexOfSlice(Delimiter, start) + JpegFooter.length
    
        while (end > JpegFooter.length) {
          jpegs += bytes.slice(start, end)
          start = end
          end = bytes.indexOfSlice(Delimiter, start) + JpegFooter.length
        }
    
        if (start < bytes.length) {
          jpegs += bytes.drop(start)
        }
    
        jpegs
      }
    

    I'm sure there's a more efficient and functional implementation, but that's for another day!