I wrote this week an extension for the printf
family of functions to accept %b
to print binary. For that, I used the function register_printf_specifier()
.
Now I wonder if I can do the same in the scanf
family of functions to accept a binary input and write it into a variable.
Is there any extension that allows me to do that?
TL;DR: No. At least no when using glibc
.
I've downloaded recent glibc
version:
% wget https://ftp.gnu.org/gnu/glibc/glibc-2.29.tar.gz
% tar -xzf glibc-2.29.tar.gz
And grep
'ed find
, searching for random scanf
family function that came to my mind - in this case, it was vfscanf
:
% find | grep "vfscanf"
From my experience I know that real implementations are somewhere in -internal
, yet I looked through output:
./stdio-common/iovfscanf.c
./stdio-common/isoc99_vfscanf.c
./stdio-common/vfscanf-internal.c
./stdio-common/vfscanf.c
./sysdeps/ieee754/ldbl-opt/nldbl-iovfscanf.c
./sysdeps/ieee754/ldbl-opt/nldbl-isoc99_vfscanf.c
./sysdeps/ieee754/ldbl-opt/nldbl-vfscanf.c
And decided to check ./stdio-common/vfscanf.c
, that in fact contained stub to the internal function:
% cat ./stdio-common/vfscanf.c
int
___vfscanf (FILE *s, const char *format, va_list argptr)
{
return __vfscanf_internal (s, format, argptr, 0);
}
Going forward, I've looked thru the file, and reached format parser:
% cat ./stdio-common/vfscanf-internal.c | head -n 1390 | tail -n 20
}
break;
case L_('x'): /* Hexadecimal integer. */
case L_('X'): /* Ditto. */
base = 16;
goto number;
case L_('o'): /* Octal integer. */
base = 8;
goto number;
case L_('u'): /* Unsigned decimal integer. */
base = 10;
goto number;
case L_('d'): /* Signed decimal integer. */
base = 10;
flags |= NUMBER_SIGNED;
goto number;
I've looked at the end of file, and found some finishing case label:
% cat ./stdio-common/vfscanf-internal.c | tail -n 60
++done;
}
}
break;
case L_('p'): /* Generic pointer. */
base = 16;
/* A PTR must be the same size as a `long int'. */
flags &= ~(SHORT|LONGDBL);
if (need_long)
flags |= LONG;
flags |= READ_POINTER;
goto number;
default:
/* If this is an unknown format character punt. */
conv_error ();
}
}
/* The last thing we saw int the format string was a white space.
Consume the last white spaces. */
if (skip_space)
{
do
c = inchar ();
while (ISSPACE (c));
ungetc (c, s);
}
errout:
/* Unlock stream. */
UNLOCK_STREAM (s);
scratch_buffer_free (&charbuf.scratch);
if (__glibc_unlikely (done == EOF))
{
if (__glibc_unlikely (ptrs_to_free != NULL))
{
struct ptrs_to_free *p = ptrs_to_free;
while (p != NULL)
{
for (size_t cnt = 0; cnt < p->count; ++cnt)
{
free (*p->ptrs[cnt]);
*p->ptrs[cnt] = NULL;
}
p = p->next;
ptrs_to_free = p;
}
}
}
else if (__glibc_unlikely (strptr != NULL))
{
free (*strptr);
*strptr = NULL;
}
return done;
}
And the code that finished the function. This means, all format specifiers are constant for one of scanf
-family functions, and this implies that you can't register new handler without messing with the large clusterf..k in glibc source (that of course won't be portable).