(I’m referring to boxing as a way to distinguish integers and pointers at runtime. A technique used some programming languages to aid GC like OCaml. Not the Rust Box<T>
type.)
I have a Rust enum that looks like this:
#[derive(Clone, Copy, Debug, PartialEq)]
enum Type<'ts> {
TVar(usize),
Constructed(&'ts ConstructedType<'ts>),
}
As I understand it the memory layout for this enum will be two words. One for the tag and one for the payload. I would like to fit the memory in a single word if possible.
Languages like OCaml use a technique called “integer boxing” which takes advantage of the fact that pointers are aligned. This means that the lowest bit will be 0. If you shift the bits in your integer one space to the left and set the lowest bit of your integer to 1 then you’re using that bit as a tag at the cost of one bit of integer precision.
Are Rust pointers guaranteed to be aligned? How would I implement this technique for my type in Rust?
I may not be following everything you are saying, but I think you need a union
.
#[derive(Clone, Copy, Debug, PartialEq)]
enum Type<'ts> {
TVar(usize),
Constructed(&'ts ConstructedType<'ts>),
}
union CompactType<'ts> {
num: usize,
ptr: &'ts ConstructedType<'ts>
}
impl<'ts> From<CompactType<'ts>> for Type<'ts> {
fn from(compact: CompactType<'ts>) -> Type<'ts> {
unsafe {
if compact.num & 1 == 1 {
Type::TVar(compact.num >> 1)
} else {
Type::Constructed(compact.ptr)
}
}
}
}
Note that accessing members of a union
is unsafe, and you have to make sure that all invariants are enforced. For example, you have to explicitly check that the CompactType
s are correctly created with the values within range, and prevent the possibility of the objects being constructed without being subject to those checks.
I would suggest adding constructor functions to CompactType
, which return a Result
or an Option
, in case you try to use a number that is too large or a pointer to a type that is not properly aligned. When the TryFrom
feature stabilizes, you could use that, but in the meantime:
enum CompactConvertError {
NumTooBig(String),
PtrNotAligned(String),
}
impl<'ts> Type<'ts> {
fn to_compact(&self) -> Result<CompactType<'ts>, CompactConvertError> {
match self {
Type::TVar(num) => {
if num >> (mem::size_of::<usize>() * 8 - 1) == 1 {
Err(CompactConvertError::NumTooBig(
String::from("The last bit of the usize cannot be used here"))
)
} else {
Ok(CompactType { num: num << 1 | 1usize })
}
},
Type::Constructed(c) => {
if mem::align_of_val(*c) % 2 == 1 {
Err(CompactConvertError::PtrNotAligned(
String::from("The pointer must be to a type with even alignment"))
)
} else {
Ok(CompactType { ptr: c })
}
}
}
}
}
This should be flexible enough to replace ConstructedType
with a generic type parameter. The only restriction is that you shouldn't change it from a reference to an owned value, or else you'll have to worry about dropping it correctly — which cannot yet be done for a union
type in stable Rust.
As for alignment, if ConstructedType
is only 1 byte in size, you will need to add an alignment to it, to make sure it is only on an even byte, otherwise Rust may choose to pack them more tightly:
#[align(2)]
struct ConstructedType<'ts> {
// ...
}
Definitely don't add #[align(2)]
if the size is larger than 2 bytes. Perhaps someone else can offer advice on making that part more robust.