Search code examples
javaperformanceasciiatoi

Java fast atoi using byte[]


I'm working on an application that's supposed to read and process flat files. These files don't always use a consistent encoding for every field in a record, so it was decided that we should read/write bytes and avoid the necessary decoding/encoding of turning them into Strings.

However, a lot of these fields are simple integers, and I need to validate them (test that they are really integers and in a certain range). I need a function that receives a byte[] and turns that into an int. I'm assuming all the digits are plain ASCII.

I know I could do this by first turning the byte[] into a CharBuffer, decoding to ISO-8859-1 or UTF-8, and then calling Integer.parseInt() but that seems like a lot of overhead and performance is important.

So, basically what I need is a Java equivalent of atoi(). I would prefer an API function (including 3rd party APIs). Also, the function should report errors in some way.

As a side note, I'm having the same issue with fields representing date/time (these are more rare though). It would be great if someone could mention some fast C-like library for Java.


Solution

  • while i can not give you a ready java solution i want to point you onto interesting (c) code for you to read: the author of qmail has a small function to quickly parse unsigned longs from a byte array scan_ulong, you can find lots of incarnations of that function all over the web:

    unsigned int scan_ulong(register const char *s,register unsigned long *u)
    {
      register unsigned int pos = 0;
      register unsigned long result = 0;
      register unsigned long c;
      while ((c = (unsigned long) (unsigned char) (s[pos] - '0')) < 10) {
        result = result * 10 + c;
        ++pos;
      }
      *u = result;
      return pos;
    }   
    

    (taken from here: https://github.com/jordansissel/djbdnsplus/blob/master/scan_ulong.c )

    that code should translate pretty smoothly to java.