Search code examples
cmemorymemcpy

Does memcpy copy bytes in reverse order?


I am little bit confused on usage of memcpy. I though memcpy can be used to copy chunks of binary data to address we desire. I was trying to implement a small logic to directyl convert 2 bytes of hex to 16 bit signed integer without using union.

 #include <stdio.h>
    #include <stdint.h>
    #include <string.h>
    int main()
    {   uint8_t message[2] = {0xfd,0x58};
        // int16_t roll = message[0]<<8;
        // roll|=message[1];
        int16_t roll = 0;
        memcpy((void *)&roll,(void *)&message,2);
        printf("%x",roll);
    
        return 0;
    }

This return 58fd instead of fd58


Solution

  • No, memcpy did not reverse the bytes as it copied them. That would be a strange and wrong thing for memcpy to do.

    The reason the bytes seem to be in the "wrong" order in the program you wrote is that that's the order they're actually in! There's probably a canonical answer on this somewhere, but here's what you need to understand about byte order, or "endianness".

    When you declare a string, it's laid out in memory just about exactly as you expect. Suppose I write this little code fragment:

    #include <stdio.h>
    
    char string[] = "Hello";
    printf("address of string:   %p\n", (void *)&string);
    printf("address of 1st char: %p\n", (void *)&string[0]);
    printf("address of 5th char: %p\n", (void *)&string[4]);
    

    If I compile and run it, I get something like this:

    address of string:   0xe90a49c2
    address of 1st char: 0xe90a49c2
    address of 5th char: 0xe90a49c6
    

    This tells me that the bytes of the string are laid out in memory like this:

    0xe90a49c2    H
    0xe90a49c3    e
    0xe90a49c4    l
    0xe90a49c5    l
    0xe90a49c6    o
    0xe90a49c7    \0
    

    Here I've shown the string vertically, but if we laid it out horizontally, with addresses increasing from left to right, we would see the characters of the string "Hello" laid out from left to right also, just as we would expect.

    But that's for strings, which are arrays of char. But integers of various sizes are not really built out of characters, and it turns out that the individual bytes of an integer are not necessarily laid out in memory in "left-to-right" order as we might expect. In fact, on the vast majority of machines today, the bytes within an integer are laid out in the opposite order. Let's take a closer look at how that works.

    Suppose I write this code:

    int16_t i2 = 0x1234;
    printf("address of short:    %p\n", (void *)&i2);
    unsigned char *p = &i2;
    printf("%p: %02x\n", p, *p);
    p++;
    printf("%p: %02x\n", p, *p);
    

    This initializes a 16-bit (or "short") integer to the hex value 0x1234, and then uses a pointer to print the two bytes of the integer in "left-to-right" order, that is, with the lower-addressed byte first, followed by the higher-addressed byte. On my machine, the result is something like:

    address of short:    0xe68c99c8
    0xe68c99c8: 34
    0xe68c99c9: 12
    

    You can clearly see that the byte that's stored at the "front" of the two-byte region in memory is 34, followed by 12. The least-significant byte is stored first. This is referred to as "little endian" byte order, because the "little end" of the integer — its least-significant byte, or LSB — comes first.

    Larger integers work the same way:

    int32_t i4 = 0x5678abcd;
    printf("address of long:     %p\n", (void *)&i4);
    p = &i4;
    printf("%p: %02x\n", p, *p);
    p++;
    printf("%p: %02x\n", p, *p);
    p++;
    printf("%p: %02x\n", p, *p);
    p++;
    printf("%p: %02x\n", p, *p);
    

    This prints:

    address of long:     0xe68c99bc
    0xe68c99bc: cd
    0xe68c99bd: ab
    0xe68c99be: 78
    0xe68c99bf: 56
    

    There are machines that lay the byes out in the other order, with the most-significant byte (MSB) first. Those are called "big endian" machines, but for reasons I won't go into they're not as popular.

    How do you construct an integer value out of individual bytes if you don't know your machine's byte order? The best way is to do it "mathematically", based on the properties of the numbers. For example, let's go back to your original array of bytes:

    uint8_t message[2] = {0xfd, 0x58};
    

    Now, you know, because you wrote it, that 0xfd is supposed to be the MSB and 0xf8 is supposed to be the LSB. So one good way of combining them together into an integer is like this:

    int16_t roll = message[0] << 8;    /* MSB */
    roll |= message[1];                /* LSB */
    

    The nice thing about this code is that it works correctly on machines of either endianness. I called this technique "mathematical" because it's equivalent to doing it this other way:

    int16_t roll = message[0] * 256;   /* MSB */
    roll += message[1];                /* LSB */
    

    And, in fact, this suggestion of mine involving roll = message[0] << 8 is very close to something you already tried, but had commented out in the code you posted. The difference is that you don't want to think about it in terms of two bytes next to each other in memory; you want to think about it in terms of the most- and least-significant byte. When you say << 8, you're obviously thinking about the most-significant byte, so that should be message[0].