Search code examples
c#arraysloopsbyte

modifying items in byte[] arrays using loops (C#)


I got a byte[] array from an input file (ASCII text file) and I'm trying to delete all bytes of a pre-set hex value, without converting the array to string and without using lists.

This is the code I came up with so far:

byte[] byteLine = File.ReadAllBytes(filePath);
byte lineBreak = (byte)0x0d;
int breakIndex = Array.IndexOf(byteLine, lineBreak);

Those are the variables I thought I'd need for this to work. Here I found a method that does something similar to what I was trying to do so I added the code (as is) to the project and tried using it in a loop. First a for loop:

for(int i = 0; i < byteLine.Length; i++) {

    byte v = byteLine[i];
    if (v == lineBreak) {
    
        RemoveRange(byteLine, breakIndex, 2);
        
    }
}

I was trying to go through every byte in the array until one equivalent to the line break byte appears, then I tried calling the method that starting at the line break index (breakIndex var) deletes 2 byes (because line breaks in ASCII use 2 byes 0d0a). It was supposed to cycle through all array and repeat this operation for all the line breaks (lineBreak var). That's what I thought it would do if I used a for(;;) loop but it didn't, I must've made mistakes. Second try using foreach:

int t = breakIndex;
foreach (byte b in byteLine) {
if (b == lineBreak) {
        while (t != 0)
        {
            RemoveRange(byteLine, breakIndex, 2);
            t--;
        }
    }        
}

In the second loop I added a variable (t) to use as a "counter" which starts equal as the line break index value (breakIndex var). I know the input file is a square ASCII image so it got the same number of lines as the number of line breaks symbols (minus one line break because last char of last line isn't a break), so the loop should've cycled for a number of times equal to the number of line breaks, or until t reached the value 0 since it goes down by 1 after every loop. Same issue in this loop, the method didn't trigger. This is the third and last thing I tried:

int t = 0;
foreach (byte b in byteLine) {
    if (b == lineBreak) { do
        {
            RemoveRange(byteLine, breakIndex, 2);
            t++;
        }
        while (t < breakIndex); }
    }
}

Similar to the other foreach loop but it goes by ascension rather than descension, it should've continued cycling until the t variable reached the value of the total number of line breaks. I don't even know if I'm using do and while correctly, it's my first time trying to use do. This said, I tested and method outside the loop and it works perfectly, it just deletes the first line break since it wasn't meant to loop on its own. After some searching I found this answer.

From reading that answer I understand that there should be a way to directly edit items in a byte[] array without many workarounds, the issue is that I don't know how to apply that solution to my problem since I'm not familiar with those C# functions.


Solution

  • Here is an implementation processing the array in-place.

    const byte CR = 0x0d, LF = 0x0a;
    
    byte[] byteLine = [1, 2, 3, 4, CR, LF, 5, 6, 7, CR, LF, CR, LF, 8, 9];
    
    int firstCR = Array.IndexOf(byteLine, CR);
    if (firstCR >= 0) {
        // Jump to first CR
        int destination = firstCR, source = firstCR + 2;
        while (source < byteLine.Length) {
            if (byteLine[source] == CR) {
                source += 2;
            } else {
                byteLine[destination++] = byteLine[source++];
            }
        }
        // Fill remaining bytes with 0
        for (int i = destination; i < byteLine.Length; i++) {
            byteLine[i] = 0;
        }
    }
    

    But note that it is far away from being a one-liner and it is more likely to contain an error than a one-liner like byteLine.Where(b => b != lineBreak).ToArray().

    Also it does not make sense to make optimizations like this without doing benchmarks. Solutions which you think might be slow are sometimes faster than expected. Microsoft puts a lot of effort in speeding up loops and LINQ queries. In some cases, LINQ uses vectorization, which can produce faster results than a supposedly faster for-loop.

    See also: