Search code examples
cgccbinaryscanfglibc

scanf register new conversion specifier


I wrote this week an extension for the printf family of functions to accept %b to print binary. For that, I used the function register_printf_specifier().

Now I wonder if I can do the same in the scanf family of functions to accept a binary input and write it into a variable.

Is there any extension that allows me to do that?


Solution

  • TL;DR: No. At least no when using glibc.


    I've downloaded recent glibc version:

    % wget https://ftp.gnu.org/gnu/glibc/glibc-2.29.tar.gz
    % tar -xzf glibc-2.29.tar.gz
    

    And grep'ed find, searching for random scanf family function that came to my mind - in this case, it was vfscanf:

    % find | grep "vfscanf"
    

    From my experience I know that real implementations are somewhere in -internal, yet I looked through output:

    ./stdio-common/iovfscanf.c
    ./stdio-common/isoc99_vfscanf.c
    ./stdio-common/vfscanf-internal.c
    ./stdio-common/vfscanf.c
    ./sysdeps/ieee754/ldbl-opt/nldbl-iovfscanf.c
    ./sysdeps/ieee754/ldbl-opt/nldbl-isoc99_vfscanf.c
    ./sysdeps/ieee754/ldbl-opt/nldbl-vfscanf.c
    

    And decided to check ./stdio-common/vfscanf.c, that in fact contained stub to the internal function:

    % cat ./stdio-common/vfscanf.c
    
    int
    ___vfscanf (FILE *s, const char *format, va_list argptr)
    {
      return __vfscanf_internal (s, format, argptr, 0);
    }
    

    Going forward, I've looked thru the file, and reached format parser:

    % cat ./stdio-common/vfscanf-internal.c | head -n 1390 | tail -n 20
              }
              break;
    
            case L_('x'):   /* Hexadecimal integer.  */
            case L_('X'):   /* Ditto.  */
              base = 16;
              goto number;
    
            case L_('o'):   /* Octal integer.  */
              base = 8;
              goto number;
    
            case L_('u'):   /* Unsigned decimal integer.  */
              base = 10;
              goto number;
    
            case L_('d'):   /* Signed decimal integer.  */
              base = 10;
              flags |= NUMBER_SIGNED;
              goto number;
    

    I've looked at the end of file, and found some finishing case label:

    % cat ./stdio-common/vfscanf-internal.c | tail -n 60
                      ++done;
                    }
                }
              break;
    
            case L_('p'):   /* Generic pointer.  */
              base = 16;
              /* A PTR must be the same size as a `long int'.  */
              flags &= ~(SHORT|LONGDBL);
              if (need_long)
                flags |= LONG;
              flags |= READ_POINTER;
              goto number;
    
            default:
              /* If this is an unknown format character punt.  */
              conv_error ();
            }
        }
    
      /* The last thing we saw int the format string was a white space.
         Consume the last white spaces.  */
      if (skip_space)
        {
          do
            c = inchar ();
          while (ISSPACE (c));
          ungetc (c, s);
        }
    
     errout:
      /* Unlock stream.  */
      UNLOCK_STREAM (s);
    
      scratch_buffer_free (&charbuf.scratch);
    
      if (__glibc_unlikely (done == EOF))
        {
          if (__glibc_unlikely (ptrs_to_free != NULL))
            {
              struct ptrs_to_free *p = ptrs_to_free;
              while (p != NULL)
                {
                  for (size_t cnt = 0; cnt < p->count; ++cnt)
                    {
                      free (*p->ptrs[cnt]);
                      *p->ptrs[cnt] = NULL;
                    }
                  p = p->next;
                  ptrs_to_free = p;
                }
            }
        }
      else if (__glibc_unlikely (strptr != NULL))
        {
          free (*strptr);
          *strptr = NULL;
        }
      return done;
    }
    

    And the code that finished the function. This means, all format specifiers are constant for one of scanf-family functions, and this implies that you can't register new handler without messing with the large clusterf..k in glibc source (that of course won't be portable).