Search code examples
bashformatprintfbase32

What is the default FORMAT string of printf in bash?


I am trying to write a script to calculate base32 strings out of provided ByteString values which I get as key value pairs. The ByteString makes use of octal escape sequences AND backslash escape sequences.

Consider this script:

#! /bin/bash

LINE='  bytes: "LaPaLaPa\363\""'

echo $LINE

K="${LINE%%: *}"
V="${LINE#*: }"
V="${V#\"}"
V="${V%\"}"      
K="${K^^}"

echo "KV='${K}'='${V}'"

FOO="$(printf "%b" "${V}")"
echo "=========================================="
printf "${FOO}" | wc -c
printf "${FOO}" | od -bc -tu1 -w24
printf "${FOO}" | base32 | tr -d "="
echo "Correct or at least wanted result!"
echo "------------------------------------------"
printf '%s' "${FOO}" | wc -c
printf '%s' "${FOO}" | od -bc -tu1 -w24
printf '%s' "${FOO}" | base32 | tr -d "="
echo "------------------------------------------"
printf '%b' "${FOO}" | wc -c
printf '%b' "${FOO}" | od -bc -tu1 -w24
printf '%b' "${FOO}" | base32 | tr -d "="
echo "------------------------------------------"
printf "%s" "${FOO}" | wc -c
printf "%s" "${FOO}" | od -bc -tu1 -w24
printf "%s" "${FOO}" | base32 | tr -d "="
echo "------------------------------------------"
printf "%b" "${FOO}" | wc -c
printf "%b" "${FOO}" | od -bc -tu1 -w24
printf "%b" "${FOO}" | base32 | tr -d "="

I get this output from it:

bytes: "LaPaLaPa\363\""
KV='  BYTES'='LaPaLaPa\363\"'
==========================================
10
0000000 114 141 120 141 114 141 120 141 363 042
          L   a   P   a   L   a   P   a 363   "
         76  97  80  97  76  97  80  97 243  34
0000012
JRQVAYKMMFIGD4ZC
Correct or at least wanted result!
------------------------------------------
11
0000000 114 141 120 141 114 141 120 141 363 134 042
          L   a   P   a   L   a   P   a 363   \   "
         76  97  80  97  76  97  80  97 243  92  34
0000013
JRQVAYKMMFIGD424EI
------------------------------------------
11
0000000 114 141 120 141 114 141 120 141 363 134 042
          L   a   P   a   L   a   P   a 363   \   "
         76  97  80  97  76  97  80  97 243  92  34
0000013
JRQVAYKMMFIGD424EI
------------------------------------------
11
0000000 114 141 120 141 114 141 120 141 363 134 042
          L   a   P   a   L   a   P   a 363   \   "
         76  97  80  97  76  97  80  97 243  92  34
0000013
JRQVAYKMMFIGD424EI
------------------------------------------
11
0000000 114 141 120 141 114 141 120 141 363 134 042
          L   a   P   a   L   a   P   a 363   \   "
         76  97  80  97  76  97  80  97 243  92  34
0000013
JRQVAYKMMFIGD424EI

Ok, so why I don't just use the first result if that seems to work?

Well one reason is printf should not be used without a FORMAT string I guess and because there should be some FORMAT string for printf which seems to be used by default(?) and does accomplish what I want? The other reason is I had other ByteStrings where I got errors ONLY when I didn't provide any FORMAT string (printf: ...: invalid format character) and I think this happened when there were percentage characters inside the ByteString but I am not sure at this point and I do not have examples right not unfortunately which reproduce this. So I have to provide a FORMAT string to be safe, right? But as you can see when I try some other FORMAT strings I get the wrong result for this example!?!?

So in case there is a FORMAT string which just works for any case then I could just use this one but I did not find any default so far?

So what is the default FORMAT for the printf bash builtin function?

EDIT The title of my question is what was answered in good detail so thanks for that first of all. I already learned to check the synopsis carefully so I could have figured that out myself. The problem is a bit more complex in that I have this mix of octal escapes and backslash escapes together. But if I use double quotes somewhere to get the ByteString interpolated automatically then this would interpolate the octal values not correct in the sense that it would just escapes the FIRST of the THREE digits. So the two characters or bytes inside the double quotes "\363\"" would become 363" so I would then get out 4 characters / bytes of it 3,6,3 and a double quote and NOT the character with octal value 363 followed by a double quote! So I guess my question (now that I know more about printf and that the upstream puts out non standard ByteStrings) is now which is the best / fail safe strategy now? Wouldn't it make sense to convert / transform the octal escape sequences maybe myself somehow first? And then let bash (I assume it is bash doing the interpolation between double quotes "" ?) do the interpolation of the remaining backslash escapes? Or how would I do this in two steps then? The strategy I tried with printf '%s' or '%b' in the script did not workout in the end so far and I don't now how to make this work.

So to sum it up I guess the right strategy here would then be to reduce the value by replacing octal escapes with corresponding characters OR maybe standard backslash escapes in a first step so that the result of this can than further get inerpolated by bash itself when put between double qutes? Is this right? If yes how can this be done?

EDIT2 As suggested by Aaron in the comments I tried to come up with a solution which is to use the printf FORMAT string %b to get the octal escape sequences converted to characters and then right after that step do the transformation of the result where I replace all occurrences of \" with a single double quote ".

printf '%b' "${FOO}" | sed 's|\\"|"|g' | wc -c
printf '%b' "${FOO}" | sed 's|\\"|"|g' | od -bc -tu1 -w24
printf '%b' "${FOO}" | sed 's|\\"|"|g' | base32 | tr -d "="
10
0000000 114 141 120 141 114 141 120 141 363 042
          L   a   P   a   L   a   P   a 363   "
         76  97  80  97  76  97  80  97 243  34
0000012
JRQVAYKMMFIGD4ZC

This seems to work as I get the result that is correct in this case.

I hope this produces the correct results in every case as well...


Solution

  • printf can't be used without a format string : when you call it with a single argument, that argument is parsed as the format.

    Consider its synopsys in man bash :

    printf [-v var] format [arguments]

    It's the arguments list that is optional, not the format.

    man bash goes on saying that plain characters found in the format string are copied as-is to the output stream, which is why you can use printf 'message' as you would echo 'message'.

    However, it also adds that it will identify character escape sequences to convert them before printing (which is similar to what echo -e would do) and most importantly, "format character sequences" (%X substrings) which it will replace by the (possibly transformed) additional parameters, or a default value if there is no parameter left to consume.

    This is the reason why you shouldn't printf "$message" : your $message might contain sequences of characters that will be interpreted by printf.

    If you want to print a message as-is, you will want to use printf '%s' "$message", where %s is the format specifier that asks printf to ouput the (text) parameter as text (so, to output it unmodified).