Search code examples
bashspecial-charactersstdoutstdin

Capturing special characters from stdin to a shell variable


I have a program which prints something that contains null bytes \0 and special characters like \x1f and newlines. For instance:

someprogram

#!/bin/bash
printf "ALICE\0BOB\x1fCHARLIE\n"

Given such a program, I want to read its output in such a way that all those special characters are captured in a shell variable output. So, if I run:

echo $output

because I'm not giving -e, I'd want the output to be:

ALICE\0BOB\x1fCHARLIE\n

How can this be achieved?

My first attempt was:

output=$(someprogram)

But I got this echoed output which doesn't have the special characters:

./myscript.sh: line 2: warning: command substitution: ignored null byte in input
ALICEBOBCHARLIE

I also tried to use read as follows:

output=""
while read -r
do
    output="$output$REPLY"
done < <(someprogram)

Then I got rid of the warning but the output is still missing all special characters:

ALICEBOBCHARLIE

So how can I capture the output of someprogram in such a way that I have all the special characters in my resulting string?

EDIT: Note that it is possible to have such strings in bash:

$ x="ALICE\0BOB\x1fCHARLIE\n"
$ echo $x
ALICE\0BOB\x1fCHARLIE\n

So that shouldn't be the problem.

EDIT2: I'll reformulate the question a little bit now that I got an accepted answer and I understood things a little bit better. So, I just needed to be able to store the output of someprogram in some shell variable in such a way that I can print it to stdout without any changes in any special characters as if someprogram was just piped directly to stdout.


Solution

  • You just can't store zero byte in bash variables. It's impossible.

    The usual solution is to convert the stream of bytes into hexadecimal. Then convert it back each time you want to do something with it.

    $ x=$(printf "ALICE\0BOB\x1fCHARLIE\n" | xxd -p)
    $ echo "$x"
    414c49434500424f421f434841524c49450a
    $ <<<"$x" xxd -p -r | hexdump -C
    00000000  41 4c 49 43 45 00 42 4f  42 1f 43 48 41 52 4c 49  |ALICE.BOB.CHARLI|
    00000010  45 0a                                             |E.|
    00000012
    

    You can also write your own serialization and deserialization functions for the purpose.

    Another idea I have is to for example read the data into an array by using zero byte as a separator (as any other byte is valid). This however will have problems with distinguishing the trailing zero byte:

     $ readarray -d '' arr < <(printf "ALICE\0BOB\x1fCHARLIE\n")
     $ printf "%s\0" "${arr[@]}" | hexdump -C
     00000000  41 4c 49 43 45 00 42 4f  42 1f 43 48 41 52 4c 49   |ALICE.BOB.CHARLI|
     00000010  45 0a 00                                          |E..|
     #               ^^ additional zero byte if input doesn't contain a trailing zero byte
     00000013