I couldn't find this anywhere on the internet, so figured I'd add it as documentation.
I wanted to join a json array around the non-displaying character \30
("RecordSeparator") so I could safely iterate over it in bash, but I couldn't quite figure out how to do it. I tried echo '["one","two","three"]' | jq 'join("\30")'
and a couple permutations of that, but it didn't work.
Turns out the solution is pretty simple.... (See answer)
Use jq -j
to eliminate literal newlines between records and use only your own delimiter. This works in your simple case:
#!/usr/bin/env bash
data='["one","two","three"]'
sep=$'\x1e' # works only for non-NUL characters, see NUL version below
while IFS= read -r -d "$sep" rec || [[ $rec ]]; do
printf 'Record: %q\n' "$rec"
done < <(jq -j --arg sep "$sep" 'join($sep)' <<<"$data")
...but it also works in a more interesting scenario where naive answers fail:
#!/usr/bin/env bash
data='["two\nlines","*"]'
while IFS= read -r -d $'\x1e' rec || [[ $rec ]]; do
printf 'Record: %q\n' "$rec"
done < <(jq -j 'join("\u001e")' <<<"$data")
returns (when run on Cygwin, hence the CRLF):
Record: $'two\r\nlines'
Record: \*
That said, if using this in anger, I would suggest using NUL delimiters, and filtering them out from the input values:
#!/usr/bin/env bash
data='["two\nlines","three\ttab-separated\twords","*","nul\u0000here"]'
while IFS= read -r -d '' rec || [[ $rec ]]; do
printf 'Record: %q\n' "$rec"
done < <(jq -j '[.[] | gsub("\u0000"; "@NUL@")] | join("\u0000")' <<<"$data")
NUL is a good choice because it's a character than can't be stored in C strings (like the ones bash uses) at all, so there's no loss in the range of data which can be faithfully conveyed when they're excised -- if they did make it through to the shell, it would (depending on version) either discard them, or truncate the string at the point when one first appears.