Search code examples
androiduuid

android, how to generate shorter version of uuid (13 chars) an app side


android app needs to generate uuid with 13 chars. But that may increase the chance of clashing. Come up with this function, idea was adding the uuid's most/least SignificantBits, and then get the string from the Long. and then figure out the 13 byte length part from the result. Test run seems not seeing clash on single machine (+100,000 uuids). But not sure the clashing possibility across machines.

is there a better way which generates 13 chars uuid and reasonable low classing rate?

val random = Random()
fun generateUUID() {
    val uuid: UUID = UUID.randomUUID()
    val theLong = if (random.nextBoolean()) {
            uuid.mostSignificantBits + uuid.leastSignificantBits
        } else {
            uuid.leastSignificantBits + uuid.mostSignificantBits
        }
    return java.lang.Long.toString(theLong, Character.MAX_RADIX)
}

Solution

  • It won't be an UUID in the strict sense anymore; UUID describes a very specific data structure. Using the low bits of a proper UUID is generally a bad idea; those were never meant to be unique. Single machine tests will be inconclusive.

    EDIT: now that I think of it, what exactly is "char" in the question? A decimal digit? A hex digit? A byte? An ASCII character? A Unicode character? If the latter, you can stuff a full proper UUID there. Just represent it as binary, not as a hexadecimal string. A UUID is 128 bits long. A Unicode codepoint is 20 bits, ergo 13 of those would cover 260 bits, that's well enough.

    The Java char datatype is, effectively, slightly less than 16 bits. If by "13 chars" you mean a Java string of length 13 (or an array of 13 chars), you can still stuff a UUID there, with some trickery to avoid reserved UTF-16 surrogate pair values.


    All that said, for globally unique ID generation, they usually use a combination of current time, a random number, and some kind of device specific identifier, hashed together. That's how canonical UUIDs work. Depending on the exact nature of the size limit (which is vague in the question), a different hash algorithm would be advisable.


    EDIT: about using the whole range of Unicode. First things first: you do realize that both "du3d2t5fdaib4" and "8efc9756-70ff-4a9f-bf45-4c693bde61a4" are hex strings, right? They only use 16 characters, 0-9 and a-f? The dashes in case of the second one can be safely omitted, they're there just for readability. Meanwhile, a single Java char can have one of 63488 possible values - any codepoint from 0 to 0xFFFF, except for the subrange 0xD800..0xDFFF, would do. The string with all those crazy characters won't be nice looking or even printable; it could look something like "芦№Π║ثЯ"; some of the characters might not display in Android because they're not in the system font, but it will be unique all right.

    Is it a requirement that the unique string displays nicely?


    If no, let's see. A UUID is two 64-bit Java longs. It's a signed datatype in Java; would've been easier if it was unsigned, but there's no such thing. We can, however, treat two longs as 4 ints, and make sure the ints are positive.

    Now we have 4 positive ints to stuff into 13 characters. We also don't want to mess with arithmetic that straddles variable boundaries, so let's convert each integer into a 3 character chunk with no overlap. This wastes some bits, but oh well, we have some bits to spare. An int is 4 bytes long, while 3 Java characters are 6 bytes long.

    When composing the chars, we would like to avoid the area between D800 and DFFF. Also, we would want to avoid the codepoints from 0 to 1F - those are control characters, unprintable by design. Also, let's avoid character 0x20 - that's space. Now, I don't know exactly how will the string be used; whether or not it will be used in a text format that doesn't allow for escaping and therefore if certain other characters should be avoided to make things simpler downstream.

    A contiguous character range is easier to work with, so let's completely throw away the range upwards from 0xD800, too. That leaves us with 0xD7DF distinct codepoints, starting from 0x21. Three of those is plenty enough to cover a 32-bit int. The rule for converting an int into a character triple is straightforward: divide the int by 0xD7DF twice, take the remainders, add the remainders to the base codepoint (which is 0x21). This algorithm is your vanilla "convert an int to a string in base N", with the knowledge that there can be no more than three digits.

    All things considered, here goes Java:

    public static String uuidToWeirdString(UUID uuid)
    {
        //Description of our alphabet: from 021 to 0xD7FF
        final int ALPHA_SIZE = 0xD7DF, ALPHA_BASE = 0x21;
    
        //Convert the UUID to a pair of signed, potentially negative longs
        long low = uuid.getLeastSignificantBits(),
            high = uuid.getMostSignificantBits();
    
        //Convert to positive 32-bit ints, represented as signed longs
        long []parts = {
            (high >> 32) & 0xffffffff,
            high & 0xffffffff,
            (low >> 32) & 0xffffffff,
            low & 0xffffffff
        };
    
        //Convert ints to char triples
        int nPart, pos = 0;
        char []c = new char[12];
        for(nPart=0;nPart<4;nPart++)
        {
            long part = parts[nPart];
            c[pos++] = (char)(ALPHA_BASE + part / (ALPHA_SIZE*ALPHA_SIZE));
            c[pos++] = (char)(ALPHA_BASE + (part / ALPHA_SIZE ) % ALPHA_SIZE);
            c[pos++] = (char)(ALPHA_BASE + part % ALPHA_SIZE);
        }
        return new String(c);
    }
    

    Feast your eyes on the beauty of the Unicode.