Search code examples
c++bytemp3id3id3v2

ID3v2, wrong APIC frame size calculation


The data size of each ID3v2 frame is stored in 4, 5, 6 and 7 bytes of the frame header block (according to documentation):

enter image description here

I am reading frame sizes from this mp3 file: enter image description here

To convert to integer the frame data size, I use the following code:

int frame_data_sz = b4 << 21 | b5 << 14 | b6 << 7 | b7;

For example, for most frames, this code correctly calculates the frame size:

TALB -> 0x00 0x00 0x00 0x02 -> 2 bytes - correct!

TIT2 -> 0x00 0x00 0x00 0x02 -> 2 bytes - correct!

TPE1 -> 0x00 0x00 0x00 0x02 -> 2 bytes - correct!

TCON -> 0x00 0x00 0x00 0x02 -> 2 bytes - correct!

TCOM -> 0x00 0x00 0x00 0x02 -> 2 bytes - correct!

TRCK -> 0x00 0x00 0x00 0x03 -> 3 bytes - correct!

TLEN -> 0x00 0x00 0x00 0x06 -> 6 bytes - correct!

COMM -> 0x00 0x00 0x00 0x06 -> 6 bytes - correct!

APIC -> 0x00 0x01 0x0F 0x5D -> 18397 bytes - INCORRECT!

For "APIC" frame my code calculates wrong frame size value because actual data size value is 71517 bytes.

How correctly convert the frame size to an integer value?


Solution

  • The 4th byte of the ID3v2 header tells us its version:

    While the header size is stored the same for all 3 versions, the frame sizes are all stored differently - your way of reading would only be valid for version 2.4.0:

    bytes \ version 2.2.0 2.3.0 2.4.0
    header size 4 * %0xxxxxxx = 28 bit 4 * %0xxxxxxx = 28 bit 4 * %0xxxxxxx = 28 bit
    frame size $xx xx xx = 24 bit $xx xx xx xx = 32 bit 4 * %0xxxxxxx = 28 bit

    But there are more inconsistencies, hinting at misunderstanding the tag format from whichever software wrote it:

    • All the text frames with a size of 2 are effectively empty. Why would a software even store empty frames, instead of omitting them?
    • All the text frames, including the comment, have an unneeded terminating $00 in it, further inflating the size of the tag for no purpose.
    • The comment frame only carries the information that it is in the language rus, but since both description and text are empty, this is a waste of space (and sense), too.
    • The APIC size is $00 01 0f 5d, which is in decimal actually 69,469 (67.84 KiB). This is off by your expectation of 2048 bytes.