Search code examples
delphidelphi-7endianness

How do I read integers from big-endian binary file if Windows/Delphi/IDE implies little-endian order?


I am very confused. I need to read binary files (.fsa extension by Applied Biotechnology aka ABIF, FASTA files) and I ran into a problem reading signed integers. I am doing everything according to this manual https://drive.google.com/file/d/1zL-r6eoTzFIeYDwH5L8nux2lIlsRx3CK/view?usp=sharing So, for example, let's look at the fDataSize field in the header of a file https://drive.google.com/file/d/1rrL01B_gzgBw28knvFit6hUIA5jcCDry/view?usp=sharing

I know that it is supposed to be 2688 (according to the manual, it is a signed integer of 32 bits), which is 00000000 00000000 00001010 10000000 in binary form. Indeed, when I read these 32 bits as an array of 4 bytes, I get [0, 0, 10, -128], which is exactly the same in binary form.

However, if I read it as Integer, it results in 16809994, which is 00000001 00000000 10000000 00001010 in bits.

As I understood from multiple forums, they use Swap and htonl functions to convert integers from little-endian order to big-endian. They also recommend using BSWAP EAX instruction for 32bit integers. But in this case they work in a kind of wrong way, specifically: Swap, applied to 16809994, returns 16779904 or 00000001 00000000 00001010 10000000, and BSWAP instruction converts 16809994 to 176160769, i.e. 00001010 10000000 00000000 00000001

As we can see, built-in functions do something different from what I need. Swap is likely to return the correct result, but, for some reason, reading these bits as an Integer changes the left-most byte. So, what is wrong and what do I do?

Upd. 1 For storing the header data I use the following record:

type
  TFasMainHeader = record
    fFrmt        : array[1..4]  of ansiChar;
    fVersion     : Word;
    fDir         : array[1..4] of ansiChar;
    fNumber      : array[1..4]  of Byte; //
    fElType      : Word;
    fElSize      : Word;
    fNumEls      : array[1..4]  of Byte; //
    fDataSize    : Integer;
    fDataOffset  : Integer;
    fDO : word;
    fDataHandle  : array[1..98]  of Byte;
  end;

Then upon the button click I perform the following:

aFileStream.Read(fas_main_header, SizeOf(TFasMainHeader));
with fas_main_header do begin
    if fFrmt <> 'ABIF' then raise Exception.Create('Not an ABIF file!');
    fVersion := Swap(fVersion);
    fElType := Swap(fElType);
    fElSize := Swap(fElSize);
...

Next I need to swap Int32 variables in the right way, but at this point fDataSize, for example, is 16809994. See the state of the record in detail during debugging:

enter image description here

It doesn't make sense to me since there shouldn't be a one-bit in the binary representation of fDataSize value (it also screws the BSWAP result).

See the binary structure of the file beginning (fDataSize bytes are highlited): enter image description here


Solution

  • The problem has nothing to do with endianness, but with Delphi records.

    You have

    type
      TFasMainHeader = record
        fFrmt        : array[1..4]  of ansiChar;
        fVersion     : Word;
        fDir         : array[1..4] of ansiChar;
        fNumber      : array[1..4]  of Byte; //
        fElType      : Word;
        fElSize      : Word;
        fNumEls      : array[1..4]  of Byte; //
        fDataSize    : Integer;
        fDataOffset  : Integer;
        fDO : word;
        fDataHandle  : array[1..98]  of Byte;
      end;
    

    and you expect this record to overlay the bytes in your file, with fDataSize "on top of" 00 00 0A 80.

    But the Delphi compiler will add padding between the fields of the record to make them properly aligned. Hence, your fDataSize will not be at the correct offset.

    To fix this, use the packed keyword:

    type
      TFasMainHeader = packed record
        fFrmt        : array[1..4]  of ansiChar;
        fVersion     : Word;
        fDir         : array[1..4] of ansiChar;
        fNumber      : array[1..4]  of Byte; //
        fElType      : Word;
        fElSize      : Word;
        fNumEls      : array[1..4]  of Byte; //
        fDataSize    : Integer;
        fDataOffset  : Integer;
        fDO : word;
        fDataHandle  : array[1..98]  of Byte;
      end;
    

    Then the fields will be at the expected locations.

    And then -- of course -- you can use any method you like to swap the byte order.

    Perferably the BSWAP instruction.