Search code examples
javaandroidperformanceencryptioncbc-mac

CBC-MAC AES own implementation extremely slow


For a project I need to implement a function in Android (with java) which generates a CBC-MAC (AES) from a file. So basically the function takes different 'blocks' from the file and calculates an identifier for every block and finally combines it to an identifier for the whole file.

The function works great, however, for bigger files it is extremely slow (could take minutes to hours) because of the loops implemented. However, my knowledge on cryptography doesn't go very far so I'm not sure how to improve the speed or if it is even possible. The output gives exactly the same CBC-MAC as other libraries in different programming languages do, so it works ok.

Unfortunately I'm quite limited in using external libraries.. though the class CBCBlockCipherMac from bouncycastle is possible since I was able to include it with only a few dependencies but never got it to give the same output as the below mentioned function.

All feedback is welcome, I've been trying to solve it for 3 days now but can't figure it out. Thanks!

*Update It seems like that the function str_to_a32 in the for loop (looping over every 16 bytes) is causing the biggest speed problem. So if that function could be made faster it would solve the problem mainly. Also, unfortunately the looping over every 16 bytes is necessary since I'm implementing the same CBC-MAC function that cloud provider Mega also has implemented.

The code

        //TEST IMPLEMENTATION

    String _path_to_file = "";

    Random _random = new Random();
    long[] _key_file = new long[4];
    _key_file[0] = _random.nextInt(Integer.MAX_VALUE);
    _key_file[1] = _random.nextInt(Integer.MAX_VALUE);
    _key_file[2] = _random.nextInt(Integer.MAX_VALUE);
    _key_file[3] = _random.nextInt(Integer.MAX_VALUE);

    long[] _iv_file = new long[4];
    _iv_file[0] = _random.nextInt(Integer.MAX_VALUE);
    _iv_file[1] = _random.nextInt(Integer.MAX_VALUE);
    _iv_file[2] = 0;
    _iv_file[3] = 0;

    long[] _returned = cbc_mac(_path_to_file, _key_file, _iv_file);


//FUNCTIONS

//this function loops over the parts of the file to calculate the cbc-mac and is the problem
public static long[] cbc_mac(String _path, long[] k, long[] n) throws Exception {
    File _file = new File(_path);
    long _file_length = _file.length();
    RandomAccessFile _raf = new RandomAccessFile(_file, "r");

    //This works fine and fast
    ArrayList<chunksData> chunks = get_chunks(_file_length);

    long[] file_mac = new long[4];
    file_mac[0] = 0;
    file_mac[1] = 0;
    file_mac[2] = 0;
    file_mac[3] = 0;

    //prepare encrypt
    String iv = "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0";
    IvParameterSpec ivSpec = new IvParameterSpec(iv.getBytes());
    SecretKeySpec keySpec = new SecretKeySpec(a32_to_str(k).getBytes("ISO-8859-1"), "AES");
    Cipher cipher = Cipher.getInstance("AES/CBC/NOPADDING");
    cipher.init(Cipher.ENCRYPT_MODE, keySpec, ivSpec);
    //end prepare encrypt

    for(chunksData _chunksData : chunks) {

        int pos = (int)_chunksData._key;
        int size = (int)_chunksData._value;

        long[] chunk_mac = new long[4];
        chunk_mac[0] = n[0];
        chunk_mac[1] = n[1];
        chunk_mac[2] = n[0];
        chunk_mac[3] = n[1];

        byte[] bytes = new byte[16];

        //this loop is the really slow part since it loops over every 16 bytes
        for (int i = pos; i < pos + size; i += 16) {
            _raf.seek(i);
            int _did_read = _raf.read(bytes, 0, 16);
            if(_did_read != 16) {
                for(int o = _did_read;o<16;o++) {
                    bytes[o] = (byte)((char)'\0');
                }
            }

            long[] block = str_to_a32(new String(bytes, "ISO-8859-1"));

            chunk_mac[0] = chunk_mac[0] ^ block[0];
            chunk_mac[1] = chunk_mac[1] ^ block[1];
            chunk_mac[2] = chunk_mac[2] ^ block[2];
            chunk_mac[3] = chunk_mac[3] ^ block[3];

            chunk_mac = str_to_a32(new String(cipher.doFinal(a32_to_str(chunk_mac).getBytes("ISO-8859-1")), "ISO-8859-1"));

        }

        file_mac[0] = file_mac[0] ^ chunk_mac[0];
        file_mac[1] = file_mac[1] ^ chunk_mac[1];
        file_mac[2] = file_mac[2] ^ chunk_mac[2];
        file_mac[3] = file_mac[3] ^ chunk_mac[3];
        file_mac = str_to_a32(new String(cipher.doFinal(a32_to_str(file_mac).getBytes("ISO-8859-1")), "ISO-8859-1"));

    }

    _raf.close();

    return file_mac;

}

//this function works fine and fast
public static ArrayList<chunksData> get_chunks(long size) {

    ArrayList<chunksData> chunks = new ArrayList<chunksData>();

    long p = 0;
    long pp = 0;

    for (int i = 1; i <= 8 && p < size - i * 0x20000; i++) {
        chunksData chunks_temp = new chunksData(p, i*0x20000);
        chunks.add(chunks_temp);
        pp = p;
        p += chunks_temp._value;
    }

    while(p < size) {
        chunksData chunks_temp = new chunksData(p, 0x100000);
        chunks.add(chunks_temp);
        pp = p;
        p += chunks_temp._value;            
    }

    chunks.get(chunks.size()-1)._value = size-pp;
    if((int)chunks.get(chunks.size()-1)._value == 0) {
        chunks.remove(chunks.size()-1);
    }

    return chunks;

}

public static class chunksData {
    public long _key = 0;
    public long _value = 0;
    public chunksData(long _keyT, long _valueT){
        this._key = _keyT;
        this._value = _valueT;
    }
}

//helper function which also contains a loop and is used in the problematic loop, so might be a problem though I don't know how to speed it up
public static long[] str_to_a32(String string) {
    if (string.length() % 4 != 0) {
        string += new String(new char[4 - string.length() % 4]);
    }
    long[] data = new long[string.length() / 4];

    byte[] part = new byte[8];
    for (int k = 0, i = 0; i < string.length(); i += 4, k++) {
        String sequence = string.substring(i, i + 4);
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        try {
            baos.write(sequence.getBytes("ISO-8859-1"));
            System.arraycopy(baos.toByteArray(), 0, part, 4, 4);
            ByteBuffer bb = ByteBuffer.wrap(part);
            data[k] = bb.getLong();
        } catch (IOException e) {
            data[k] = 0;
        }
    }
    return data;
}

//helper function which also contains a loop and is used in the problematic loop, so might be a problem though I don't know how to speed it up
public static String a32_to_str(long[] data) {
    byte[] part = null;
    StringBuilder builder = new StringBuilder();
    ByteBuffer bb = ByteBuffer.allocate(8);
    for (int i = 0; i < data.length; i++) {
        bb.putLong(data[i]);
        part = copyOfRange(bb.array(), 4, 8);
        bb.clear();
        ByteArrayInputStream bais = new ByteArrayInputStream(part);
        while (bais.available() > 0) {
            builder.append((char) bais.read());
        }
    }
    return builder.toString();
}

Solution

  • My main suspect is the seek operation in your first loop and processing only 16 bytes. I don't know the algorithm but your code suggest that reading full "chunk" is possible and then you can process is it in parts are necessary.

    Also, the chunks seems to be sequential (unless I miss somehting) so whole reading could be done sequentially without the seek.

    You don't need the ByteArrayOutput stream in your helper method. Also making substring has impact, so calling toBytes on the whole string and then picking up the parts of the byte array will be more efficient.

    The code below is roughly two times faster than original.

    public long[] fast_str_to_a32(String string) throws UnsupportedEncodingException {
        if (string.length() % 4 != 0) {
            string += new String(new char[4 - string.length() % 4]);
        }
        long[] data = new long[string.length() / 4];
    
        byte[] bytes = string.getBytes("ISO-8859-1");
    
        byte[] part = new byte[8];
        ByteBuffer bb = ByteBuffer.wrap(part); 
        for (int k = 0, i = 0; i < bytes.length; i += 4, k++) {
            System.arraycopy(bytes, i, part, 4, 4);
            bb.rewind();
            data[k] = bb.getLong();
        }
        return data;
    }
    

    Also in the main method you convert the bytes to string only to convert them back to byte[] at the begining of str_to_a32, you should just use byte[] as this method input.

    I still believe that you should read the whole chunk at once, and then process it in blocs of 16 bytes.

    There is potentially a problem in your code: you try to read 16 bytes but if you get less you start padding. However, contract for read is "An attempt is made to read as many as len bytes, but a smaller number may be read." Typically the smaller number happens at the end of the file, but it principle it may happen any time. If so you will start padding in the middle of the stream and mess up your parts completely.