Search code examples
cscanfuuid

sscanf and scanset stops reading of hex numbers


I try to verify an UUID v4. I try to do this with sscanf, if the UUID can be read completly with sscanf (= total number of characters read - 36), i assume this is a correct UUID. My code up to now:

#include <stdio.h>

int main()
{
    char uuid[ 37 ] = "da4dd6a0-5d4c-4dc6-a5e3-559a89aff639";
    int a = 0, b = 0, c = 0, d = 0, e = 0, g = 0;
    long long int f = 0;

    printf( "uuid >%s<, variables read: %d \n", uuid, sscanf( uuid, "%8x-%4x-4%3x-%1x%3x-%12llx%n", &a, &b, &c, &d, &e, &f, &g ) );
    printf( " a - %x, b - %x,  c - %x,  d - %x,  e - %x, f - %llx, total number of characters read - %d \n", a, b, c, d, e, f, g );

    return 0;
}

which return the following output

uuid >da4dd6a0-5d4c-4dc6-a5e3-559a89aff639<, variables read: 6 
 a - da4dd6a0, b - 5d4c,  c - dc6,  d - a,  e - 5e3, f - 559a89aff639, total number of characters read - 36 

So far, everything okay. Now I want to include, that the first character after the third hyphen needs to be one of [89ab]. So I changed %1x%3x to %1x[89ab]%3x. But now, the first character is read and the rest not anymore. The output:

uuid >da4dd6a0-5d4c-4dc6-a5e3-559a89aff639<, variables read: 4 
a - da4dd6a0, b - 5d4c,  c - dc6,  d - a,  e - 0, f - 0, total number of characters read - 0 

What am I missing? What is wrong with the syntax? Is possible to read it like this? I tried several combinations of the scanset and the specifier, but nothing works.


Solution

  • Instead of using sscanf() for this task, you might just write a simple dedicated function:

    #include <ctype.h>
    #include <string.h>
    
    int check_UUID(const char *s) {
        int i;
        for (i = 0; s[i]; i++) {
            if (i == 8 || i == 13 || i == 18 || i == 23) {
                if (s[i] != '-')
                    return 0;
            } else {
                if (!isxdigit((unsigned char)s[i])) {
                    return 0;
            }
        }
        if (i != 36)
            return 0;
    
        // you can add further tests for specific characters:
        if (!strchr("89abAB", s[19]))
            return 0;
    
        return 1;
    }
    

    If you insist on using sscanf(), here is concise implementation:

    #include <stdio.h>
    
    int check_UUID(const char *s) {
        int n = 0;
        sscanf(s, "%*8[0-9a-fA-F]-%*4[0-9a-fA-F]-%*4[0-9a-fA-F]-%*4[0-9a-fA-F]-%*12[0-9a-fA-F]%n", &n);
        return n == 36 && s[n] == '\0';
    }
    

    If you want to refine the test for the first character after the third hyphen, add another character class:

    #include <stdio.h>
    
    int check_UUID(const char *s) {
        int n = 0;
        sscanf(s, "%*8[0-9a-fA-F]-%*4[0-9a-fA-F]-%*4[0-9a-fA-F]-%*1[89ab]%*3[0-9a-fA-F]-%*12[0-9a-fA-F]%n", &n);
        return n == 36 && s[n] == '\0';
    }
    

    Notes:

    • The * after the % means do not store the conversion, just skip the characters and the 1 means consume at most 1 character.
    • For the number of characters parsed by sscanf to reach 36, all hex digit sequences must have exactly the specified width.
    • %n causes scanf to store the number of characters read so far into the int pointed to by the next argument.
    • your conversion specification is useful to get the actual UUID numbers, but the %x format accepts leading white space, an optional sign and an optional 0x or 0X prefix, all of which are invalid inside a UUID. You can first validate the UUID, then convert it to its individual parts if required.