I have a bunch of data that I'm storing as files on Cloudflare's R2. I very early on noticed that these data files were approaching a bucket size of terrabytes so applied brotli compression which brought the size down to ~500mb.
I am now trying to expose the data via workers (to apply a filter) and have hit a snag. Cloudflare exposes WebStreams which has DecompressionStream which can decompress gzip, but not brotli.
I did convert the stream to gzip ...
let stm = resp.body
.pipeThrough(new DecompressionStream("gzip"))
.pipeThrough(ApplyFilter(sDate, eDate))
.pipeThrough(new CompressionStream("gzip"))
;
Gzip is not offering nearly the level of compression I got used to with Brotli.
261M 1158172.data (100%)
2.8M 1158172.data.gz ( 1%)
78K 1158172.data.br ( 0.03%)
So,
UPDATE
I forgot to mention having tried to convert to Node streams and using node's zlib.createBrotliDecompress
. Unfortunatly, it does not appear that Cloudflare supports zlib
in workers
Uncaught Error: No such module "node:zlib".
Is there a brotli decompress for webstreams products?
There is no support for Brotli in the (de)CompressionStream standard, but you could probably do it with WebAssembly.
Is there a way I can trick my Worker or R2 into auto decompressing?
Cloudflare will handle on-the-fly decompression itself if the client's Accept-Encoding
header doesn't indicate support for what is shown on your response's Content-Encoding
header.
Just return the compressed file as-is, with the appropriate Content-Encoding
header.
export default {
async fetch(req, env, ctx) {
const obj = await env.R2.get('result.br');
return new Response(obj.body, {
headers: {
'Content-Encoding': 'br'
},
encodeBody: 'manual'
});
},
};
curl https://xxx.xxx.workers.dev/ --header "Accept-Encoding: br" --output - -vvv
< Content-Length:15
< Content-Encoding: br
<binary content>
curl https://xxx.xxx.workers.dev/ --header "Accept-Encoding: identity" -vvv
<no content-length header, due to on-the-fly decompression>
<plain-text content>