Upon decompiling various programs (which I do not have the source for), I have found some interesting sequences of code. A program has a c-string (str
) defined in the DATA section. In some function in the TEXT section, a part of that string is set by moving a hexadecimal number to a position in the string (simplified Intel assembly MOV str,0x006f6c6c6568
). Here is an snippet in C:
#include <stdio.h>
static char str[16];
int main(void)
{
*(long *)str = 0x006f6c6c6568;
printf("%s\n", str);
return 0;
}
I am running macOS, which uses little endian, so 0x006f6c6c6568
translates to hello
. The program compiles with no errors or warnings, and when run, prints out hello
as expected. I calculated 0x006f6c6c6568
by hand, but I was wondering if C could do it for me. Something like this is what I mean:
#include <stdio.h>
static char str[16];
int main(void)
{
// *(long *)str = 0x006f6c6c6568;
*(str+0) = "hello";
printf("%s\n", str);
return 0;
}
Now, I would not like to treat "hello"
as a string literal, it might be treated like this for little-endian:
*(long *)str = (long)(((long)'h') |
((long)'e' << 8) |
((long)'l' << 16) |
((long)'l' << 24) |
((long)'o' << 32) |
((long)0 << 40));
Or, if compiled for a big-endian target, this:
*(long *)str = (long)(((long) 0 << 16) |
((long)'o' << 24) |
((long)'l' << 32) |
((long)'l' << 40) |
((long)'e' << 48) |
((long)'h' << 56));
Thoughts?
TL:DR: you want strncpy
into a uint64_t
. This answer is long in an attempt to explain the concepts and how to think about memory from C vs. asm perspectives, and whole integers vs. individual char
s / bytes. (i.e. if it's obvious that strlen/memcpy or strncpy would do what you want, just skip to the code.)
If you want to copy exactly 8 bytes of string data into an integer, use memcpy
. The object-representation of the integer will then be those string bytes.
Strings always have the first char
at the lowest address, i.e. a sequence of char
elements so endianness isn't a factor because there's no addressing within a char
. Unlike integers where it's endian-dependent which end is the least-significant byte.
Storing this integer into memory will have the same byte order as the original string, just like if you'd done memcpy
to a char tmp[8]
array instead of a uint64_t tmp
. (C itself doesn't have any notion of memory vs. register; every object has an address except when optimization via the as-if rule allows, but assigning to some array elements can get a real compiler to use store instructions instead of just putting the constant in a register. So you could then look at those bytes with a debugger and see they were in the right order. Or pass a pointer to fwrite
or puts
or whatever.)
memcpy
avoids possible undefined behaviour from alignment and strict-aliasing violations from *(uint64_t*)str = val;
. i.e. memcpy(str, &val, sizeof(val))
is a safe way to express an unaligned strict-aliasing safe 8-byte load or store in C, like you could do easily with mov
in x86-64 asm.
(GNU C also lets you typedef uint64_t aliasing_u64 __attribute__((aligned(1), may_alias));
- you can point that at anything and read/write through it safely, just like with an 8-byte memcpy.)
char*
and unsigned char*
can alias any other type in ISO C, so it's safe to use memcpy and even strncpy
to write the object-representation of other types, especially ones that have a guaranteed format / layout like uint64_t
(fixed width, no padding, if it exists at all).
If you want shorter strings to zero-pad out to the full size of an integer, use strncpy
. On little-endian machines it's like an integer of width CHAR_BIT * strlen()
being zero-extended to 64-bit, since the extra zero bytes after the string go into the bytes that represent the most-significant bits of the integer.
On a big-endian machines, the low bits of the value will be zeros, as if you left-shifted that "narrow integer" to the top of the wider integer. (And the non-zero bytes are in a different order wrt. each other).
On a mixed-endian machine (e.g. PDP-11), it's less simple to describe.
strncpy
is bad for actual strings but exactly what we want here. It's inefficient for normal string-copying because it always writes out to the specified length (wasting time and touching otherwise unused parts of a long buffer for short copies). And it's not very useful for safety with strings because it doesn't leave room for a terminating zero with large source strings.
But both of those things are exactly what we want/need here: it behaves like memcpy(val, str, 8)
for strings of length 8 or higher, but for shorter strings it doesn't leave garbage in the upper bytes of the integer.
#include <string.h>
#include <stdint.h>
uint64_t load8(const char* str)
{
uint64_t value;
memcpy(&value, str, sizeof(value)); // load exactly 8 bytes
return value;
}
uint64_t test2(){
return load8("hello world!"); // constant-propagation through it
}
This compiles very simply, to one x86-64 8-byte mov instruction using GCC or clang on the Godbolt compiler explorer.
load8:
mov rax, QWORD PTR [rdi]
ret
test2:
movabs rax, 8031924123371070824 # 0x6F77206F6C6C6568
# little-endian "hello wo", note the 0x20 ' ' byte near the top of the value
ret
On ISAs where unaligned loads just work with at worst a speed penalty, e.g. x86-64 and PowerPC64, memcpy
reliably inlines. But on MIPS64 you'd get a function call.
# PowerPC64 clang(trunk) -O3
load8:
ld 3, 0(3) # r3 = *r3 first arg and return-value register
blr
BTW, I used sizeof(value)
instead of 8
for two reasons: first so you can change the type without having to manually change a hard-coded size.
Second, because a few obscure C implementations (like modern DSPs with word-addressable memory) don't have CHAR_BIT == 8
. Often 16 or 24, with sizeof(int) == 1
i.e. the same as a char
. I'm not sure exactly how the bytes would be arranged in a string literal, like whether you'd have one character per char
word or if you'd just have an 8-letter string in fewer than 8 chars
, but at least you wouldn't have undefined behaviour from writing outside a local variable.
strncpy
// Take the first 8 bytes of the string, zero-padding if shorter
// (on a big-endian machine, that left-shifts the value, rather than zero-extending)
uint64_t stringbytes(const char* str)
{
// if (!str) return 0; // optional NULL-pointer check
uint64_t value; // strncpy always writes the full size (with zero padding if needed)
strncpy((char*)&value, str, sizeof(value)); // load up to 8 bytes, zero-extending for short strings
return value;
}
uint64_t tests1(){
return stringbytes("hello world!");
}
uint64_t tests2(){
return stringbytes("hi");
}
tests1():
movabs rax, 8031924123371070824 # same as with memcpy
ret
tests2():
mov eax, 26984 # 0x6968 = little-endian "hi"
ret
The strncpy
misfeatures (that make it not good for what people wish it was designed for, a strcpy
that truncates to a limit) are why compilers like GCC warn about these valid use-cases with -Wall
. That and our non-standard use-case, where we want truncation of a longer string literal just to demo how it would work. That's not strncpy
's fault, but the warning about passing a length limit the same as the actual size of the destination is.
n function 'constexpr uint64_t stringbytes2(const char*)',
inlined from 'constexpr uint64_t tests1()' at <source>:26:24:
<source>:20:12: warning: 'char* strncpy(char*, const char*, size_t)' output truncated copying 8 bytes from a string of length 12 [-Wstringop-truncation]
20 | strncpy(u.c, str, 8);
| ~~~~~~~^~~~~~~~~~~~~
<source>: In function 'uint64_t stringbytes(const char*)':
<source>:10:12: warning: 'char* strncpy(char*, const char*, size_t)' specified bound 8 equals destination size [-Wstringop-truncation]
10 | strncpy((char*)&value, str, sizeof(value)); // load up to 8 bytes, zero-extending for short strings
| ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Strangely, GCC for MIPS64 doesn't want to inline strnlen
, and PowerPC can more efficiently construct constants larger than 32-bit anyway. (Fewer shift instructions, as oris
can OR into bits [31:16], i.e. OR a shifted immediate.)
uint64_t foo = tests1();
uint64_t bar = tests2();
Compiling as C++ to allow function return values as initializers for global vars, clang (trunk) for PowerPC64 compiles the above with constant-propagation into initialized static storage in .data
for these global vars, instead of calling a "constructor" at startup to store into the BSS like GCC unfortunately does. (It's weird because GCC's initializer function just constructs the value from immediates itself and stores.)
foo:
.quad 7522537965568948079 # 0x68656c6c6f20776f
# big-endian "h e l l o w o"
bar:
.quad 7523544652499124224 # 0x6869000000000000
# big-endian "h i \0\0\0\0\0\0"
The asm for tests1()
can only construct a constant from immediates 16 bits at a time (because an instruction is only 32 bits wide, and some of that space is needed for opcodes and register numbers). Godbolt
# GCC11 for PowerPC64 (big-endian mode, not power64le) -O3 -mregnames
tests2:
lis %r3,0x6869 # Load-Immediate Shifted, i.e. big-endian "hi"<<16
sldi %r3,%r3,32 # Shift Left Doubleword Immediate r3<<=32 to put it all the way to the top of the 64-bit register
# the return-value register holds 0x6869000000000000
blr # return
tests1():
lis %r3,0x6865 # big-endian "he"<<16
ori %r3,%r3,0x6c6c # OR Immediate producing "hell"
sldi %r3,%r3,32 # r3 <<= 32
oris %r3,%r3,0x6f20 # r3 |= "o " << 16
ori %r3,%r3,0x776f # r3 |= "wo"
# the return-value register holds 0x68656c6c6f20776f
blr
I played around a bit with getting constant-propagation to work for an initializer for a uint64_t foo = tests1()
at global scope in C++ (C doesn't allow non-const initializers in the first place) to see if I could get GCC to do what clang does. No success so far. And even with constexpr
and C++20 std::bit_cast<uint64_t>(struct_of_char_array)
I couldn't get g++ or clang++ to accept uint64_t foo[stringbytes2("h")]
to use the integer value in a context where the language actually requires a constexpr
, rather than it just being an optimization. Godbolt.
IIRC std::bit_cast should be able to manufacture a constexpr integer out of a string literal but there might have been some trick I'm forgetting; I didn't search for existing SO answers yet. I seem to recall seeing one where bit_cast
was relevant for some kind of constexpr type-punning.
Credit to @selbie for the strncpy
idea and the starting point for the code; for some reason they changed their answer to be more complex and avoid strncpy
, so it's probably slower when constant-propagation doesn't happen, assuming a good library implementation of strncpy
that uses hand-written asm. But either way still inlines and optimizes away with a string literal.
Their current answer with strnlen
and memcpy
into a zero-initialized value
is exactly equivalent to this in terms of correctness, but compiles less efficiently for runtime-variable strings.