Search code examples
javahadoophivedistinctradix

Convert 15 char length string to unique long number - using BigInteger longValue()


I want to convert a string of max length 15 to a unique long number. I am trying to use BigInteger's longValue() function for the same.

BigInteger bigInt = new BigInteger("abcdeabcdeabcda".getBytes());
long n = bigInt.longValue();
  1. Can we avoid collision of long value until 15 chars of String?
  2. String can contain alphanumeric including special character.
  3. The idea not to encrypt the string to long. But to improve the performance of count(distinct) of hive queries.
  4. We note that count(distinct) in hive provides good performance if long is used instead of string.
  5. We don't want approx. or probablistic count distinct. We want exact count distinct.

Thanks in Advance


Solution

  • No you can't - at least not without collisions.

    ASCII is 7-bits per character and 15 * 7 = 105 bits - you cannot fit that into a long.

    You suggest you may not need full ASCII - perhaps base 64 which is 6-bit but 15 * 6 = 90, still way too long.

    Even if case is irrelevant and you can get by without four of your alpha characters, using base 32 you still have 15 * 5 = 75 which is still too bug for a 64-bit number.

    You will need to accept that there will be collisions but perhaps there are ways to reduce them. How are you generating these 15-character strings? Is there a pattern you can make use of?

    The selected answer of the question @Athanor points out has a good idea - use two longs. 2 * 64 = 128. Your potentially 105 bit number using 7-bit ASCII would fit fine into two longs.