I am trying to figure out on which data the crc32 field in the header of a RAR Recovery Record is based. I am trying to recreate a RAR volume based on a previous RAR volume and the extracted contents. I am up to the point where only 12 bytes differ from the correct/original volume.
The names are based on the unrar source code (arcread.cpp) or the RAR technote.
A RAR file consists of blocks. They have a header and a body:
[header][body]
The header contains metadata that describes the body. One of these blocks is HEAD_TYPE=0x74 File header (File in archive).
[header:a...FILE_CRC...z][body]
The field FILE_CRC (4 bytes) is calculated on all the data available in the [body], which is a stored or compressed file.
The block of a Recovery Record (HEAD_TYPE=0x7a subblock) is very similar to a file block, but it contains three extra fields in the header:
[header:a...FILE_CRC...z, "Protect+", rsc, dsc][body]
rsc: recovery sector count (4 bytes)
dsc: data sector count (8 bytes)
assert dsc*2 + rsc*512 == size([body])
You would think the FILE_CRC of this block is based on the data in the body, just like the file block, but this isn't the case. (verified independently by an other person) So my question is, what data is used to calculate this crc32?
Some things I have tried already:
Instead of using the default seed (-0x1 or 0xFFFFFFFF):
crc = crc32(data)
crc = crc32(data, ~0xffffffff)
an F was dropped (-0x10000000):
crc = crc32(data, ~0x0fffffff)
An email to the author was sent with the following response:
As far as I can see quickly looking into RAR code, this is CRC32 of all CRC16 data and all recovery record parity sectors ("All the RR data" in your list).
Note that while RAR stores this checksum, it does not use it anywhere. It is not necessary when recovering. Even if recovery record is partially damaged, its valid parts still can be used to recover data. We can check the repair success on per sector basis using CRC16, so one CRC32 covering all data is not required in recovery process.
Eugene
Like first thought, the FILE_CRC of the block is based on the data in the body. It looks as if there is a typo somewhere in the RAR code.
XADRARParser.m of TheUnarchiver2.7.1_src has the following commented code:
// Removed CRC checking because RAR uses it completely inconsitently
/* if(block.crc!=0x6152||block.type!=0x72||block.flags!=0x1a21||block.headersize!=7)
...
Almost 3 years later I found out that someone else had already found the solution to this problem earlier that year.
# Why is this odd CRC initialiser used?
crc = crc32(rr_crcs, 0xF0000000)