Search code examples
c++perlbinarymp3id3

Binary reading ID3 tag of mp3 file


I'am trying to read a mp3 file in c++ and show the id3 information that the file contains. The problem I have is when i read the frame header the size of the content that it holds is wrong. Instead of giving me a integer of 10 bytes it gives me 167772160 bytes. http://id3.org/id3v2.3.0#ID3v2_frame_overview

struct Header {
   char tag[3];
   char ver;
   char rev;
   char flags;
   uint8_t hSize[4];
};

struct ContentFrame 
{
   char id[4];
   uint32_t contentSize;
   char flags[2];
};

int ID3_sync_safe_to_int(uint8_t* sync_safe)
{
   uint32_t byte0 = sync_safe[0];
   uint32_t byte1 = sync_safe[1];
   uint32_t byte2 = sync_safe[2];
   uint32_t byte3 = sync_safe[3];

   return byte0 << 21 | byte1 << 14 | byte2 << 7 | byte3;
}

const int FRAMESIZE = 10;

The code above is used in order to translate the binary data to ASCCI data. Inside of main

Header header;
ContentFrame contentFrame;

ifstream file(argv[1], fstream::binary);
//Read header 
file.read((char*)&header, FRAMESIZE);

//This will print out 699 which is the correct filesize
cout << "Size: " << ID3_sync_safe_to_int(header.hSize) << endl << endl;

//Read frame header
file.read((char*)&contentFrame, FRAMESIZE);
//This should print out the frame size. 
cout << "Frame size: " << int(contentFrame.contentSize) << endl;

I have written a program for this task in Perl and it works fine, there unpack is used such as:

my($tag, $ver, $rev, $flags, $size) = unpack("Z3 C C C N"), "header");
my($frameID, $FrameContentSize, $frameFlags) = unpack("Z4 N C2", "content");

sync_safe_to_int is also used in order to get the size of the header correct but for the contet size it is only to print witout any conversion N An unsigned long (32-bit) in "network" (big-endian) order.
C An unsigned char (octet) value.
Z A null-terminated (ASCIZ) string, will be null padded.

The output from my program:
Header content
Tag: ID3
Ver: 3
Rev: 0
Flags: 0
Size: 699

WRONG Output! Frame content
ID: TPE1
size: 167772160
Flags:

Correct output from Perl! Frame content
ID: TPE1
size: 10
Flags: 0


Solution

  • contentFrame.contentSize is defined as uint32_t, but printed as (signed)int.

    Also, as the document states multibyte numbers are Big Endian:

    The bitorder in ID3v2 is most significant bit first (MSB). The byteorder in multibyte numbers is most significant byte first (e.g. $12345678 would be encoded $12 34 56 78).

    No conversion is done for contentFrame.contentSize however. Those bytes should be reversed too, as in ID3_sync_safe_to_int(), but this time shifted in multiples of 8 instead of 7 (or use ntohl() - network-to-host order).

    You say that you get 1677772160 instead of 18, but even with manipulation of the bits/bytes for the above, they don't seem to make sense. Are you sure those are the right numbers? On top of your post you have other values:

    Instead of giving me a low integear under 100 bytes it gives me around 140000 bytes.

    Did you have a look at the bytes in memory after calling file.read((char*)&contentFrame, FRAMESIZE);? However if your ID shows TPE1 the position should be ok. I just wonder if the numbers you provided are the correct ones, because they don't make sense.

    Update with nthol() conversion:

    //Read frame header
    file.read((char*)&contentFrame, FRAMESIZE);
    uint32_t frame_size = ntohl(contentFrame);
    cout << "Frame size: " << frame_size << endl;
    

    ntohl() will work on LE-systems and on BE-systems (on BE-systems it will simply do nothig).