c#performance bitwise-operators bitwise-and

Bitwise operation performance, how to improve

I have a simple task: determine how many bytes is necessary to encode some number (byte array length) to byte array and encode final value (implement this article: Encoded Length and Value Bytes).

Originally I wrote a quick method that accomplish the task:

public static Byte[] Encode(Byte[] rawData, Byte enclosingtag) {
    if (rawData == null) {
        return new Byte[] { enclosingtag, 0 };
    }
    List<Byte> computedRawData = new List<Byte> { enclosingtag };
    // if array size is less than 128, encode length directly. No questions here
    if (rawData.Length < 128) {
        computedRawData.Add((Byte)rawData.Length);
    } else {
        // convert array size to a hex string
        String hexLength = rawData.Length.ToString("x2");
        // if hex string has odd length, align it to even by prepending hex string
        // with '0' character
        if (hexLength.Length % 2 == 1) { hexLength = "0" + hexLength; }
        // take a pair of hex characters and convert each octet to a byte
        Byte[] lengthBytes = Enumerable.Range(0, hexLength.Length)
                .Where(x => x % 2 == 0)
                .Select(x => Convert.ToByte(hexLength.Substring(x, 2), 16))
                .ToArray();
        // insert padding byte, set bit 7 to 1 and add byte count required
        // to encode length bytes
        Byte paddingByte = (Byte)(128 + lengthBytes.Length);
        computedRawData.Add(paddingByte);
        computedRawData.AddRange(lengthBytes);
    }
    computedRawData.AddRange(rawData);
    return computedRawData.ToArray();
}

This is an old code and was written in an awful way.

Now I'm trying to optimize the code by using either, bitwise operators, or BitConverter class. Here is an example of bitwise-edition:

public static Byte[] Encode2(Byte[] rawData, Byte enclosingtag) {
    if (rawData == null) {
        return new Byte[] { enclosingtag, 0 };
    }
    List<Byte> computedRawData = new List<Byte>(rawData);
    if (rawData.Length < 128) {
        computedRawData.Insert(0, (Byte)rawData.Length);
    } else {
        // temp number
        Int32 num = rawData.Length;
        // track byte count, this will be necessary further
        Int32 counter = 1;
        // simply make bitwise AND to extract byte value
        // and shift right while remaining value is still more than 255
        // (there are more than 8 bits)
        while (num >= 256) {
            counter++;
            computedRawData.Insert(0, (Byte)(num & 255));
            num = num >> 8;
        }
        // compose final array
        computedRawData.InsertRange(0, new[] { (Byte)(128 + counter), (Byte)num });
    }
    computedRawData.Insert(0, enclosingtag);
    return computedRawData.ToArray();
}

and the final implementation with BitConverter class:

public static Byte[] Encode3(Byte[] rawData, Byte enclosingtag) {
    if (rawData == null) {
        return new Byte[] { enclosingtag, 0 };
    }
    List<Byte> computedRawData = new List<Byte>(rawData);
    if (rawData.Length < 128) {
        computedRawData.Insert(0, (Byte)rawData.Length);
    } else {
        // convert integer to a byte array
        Byte[] bytes = BitConverter.GetBytes(rawData.Length);
        // start from the end of a byte array to skip unnecessary zero bytes
        for (int i = bytes.Length - 1; i >= 0; i--) {
            // once the byte value is non-zero, take everything starting
            // from the current position up to array start.
            if (bytes[i] > 0) {
                // we need to reverse the array to get the proper byte order
                computedRawData.InsertRange(0, bytes.Take(i + 1).Reverse());
                // compose final array
                computedRawData.Insert(0, (Byte)(128 + i + 1));
                computedRawData.Insert(0, enclosingtag);
                return computedRawData.ToArray();
            }
        }
    }
    return null;
}

All methods do their work as expected. I used an example from Stopwatch class page to measure performance. And performance tests surprised me. My test method performed 1000 runs of the method to encode a byte array (actually, only array sixe) with 100 000 elements and average times are:

Encode -- around 200ms
Encode2 -- around 270ms
Encode3 -- around 320ms

I personally like method Encode2, because the code looks more readable, but its performance isn't that good.

The question: what you woul suggest to improve Encode2 method performance or to improve Encode readability?

Any help will be appreciated.

===========================

Update: Thanks to all who participated in this thread. I took into consideration all suggestions and ended up with this solution:

public static Byte[] Encode6(Byte[] rawData, Byte enclosingtag) {
    if (rawData == null) {
        return new Byte[] { enclosingtag, 0 };
    }
    Byte[] retValue;
    if (rawData.Length < 128) {
        retValue = new Byte[rawData.Length + 2];
        retValue[0] = enclosingtag;
        retValue[1] = (Byte)rawData.Length;
    } else {
        Byte[] lenBytes = new Byte[3];
        Int32 num = rawData.Length;
        Int32 counter = 0;
        while (num >= 256) {
            lenBytes[counter] = (Byte)(num & 255);
            num >>= 8;
            counter++;
        }
        // 3 is: len byte and enclosing tag
        retValue = new byte[rawData.Length + 3 + counter];
        rawData.CopyTo(retValue, 3 + counter);
        retValue[0] = enclosingtag;
        retValue[1] = (Byte)(129 + counter);
        retValue[2] = (Byte)num;
        Int32 n = 3;
        for (Int32 i = counter - 1; i >= 0; i--) {
            retValue[n] = lenBytes[i];
            n++;
        }
    }
    return retValue;
}

Eventually I moved away from lists to fixed-sized byte arrays. Avg time against the same data set is now about 65ms. It appears that lists (not bitwise operations) gives me a significant penalty in performance.

Solution

The main problem here is almost certainly the allocation of the List, and the allocation needed when you are inserting new elements, and when the list is converted to an array in the end. This code probably spend most of its time in the garbage collector and memory allocator. The use vs non-use of bitwise operators probably means very little in comparison, and I would have looked into ways to reduce the amount of memory you allocate first.

One way is to send in a reference to a byte array allocated in advance and and an index to where you are in the array instead of allocating and returning the data, and then return an integer telling how many bytes you have written. Working on large arrays is usually more efficient than working on many small objects. As others have mentioned, use a profiler, and see where your code spend its time.

Of cause the optimization I mentioned will makes your code more low level in nature, and more close to what you would typically do in C, but there is often a trade off between readability and performance.