Search code examples
rustiounsafe

Best way to read a raw struct from a file


Background (Skippable)

On linux, the file /var/run/utmp contains several utmp structures, each in raw binary format, following each other in a file. utmp itself is a relatively large (384 bytes on my machine). I am trying to read this file to it's raw data, and them implement checks after the fact that the data makes sense. I'm not new to rust, but this is my first real experience with the unsafe side of things.

Problem Statement

I have a file that contains several c sturct utmps (docs). In rust, I would like to read the entire file into an array of Vec<libc::utmpx>. More specifically, given a reader open to this file, how could I read one struct utmp?

What I have so far

Below are three different implementations of read_raw, which accepts a reader and returns a RawEntry(my alias for struct utmp). Which method is most correct? I am trying to write as performant code as possible, and I am worried that read_raw0 might be slower than the others if it involves memcpys. What is the best/fastest way to accomplish this behavior?

use std::io::Read;
use libc::utmpx as RawEntry;

const RawEntrySize = std::mem::size_of::<RawEntry>();
type RawEntryBuffer = [u8; RawEntrySize];

/// Read a raw utmpx struct
// After testing, this method doesn't work
pub fn read_raw0<R: Read>(reader: &mut R) -> RawEntry {
    let mut entry: RawEntry = unsafe { std::mem::zeroed() };
    unsafe {
        let mut entry_buf = std::mem::transmute::<RawEntry, RawEntryBuffer>(entry);
        reader.read_exact(&mut entry_buf[..]);
    }
    return entry;
}

/// Read a raw utmpx struct
pub fn read_raw1<R: Read>(reader: &mut R) -> RawEntry {
    // Worried this could cause alignment issues, or maybe it's okay 
    // because transmute copies
    let mut buffer: RawEntryBuffer = [0; RawEntrySize];
    reader.read_exact(&mut buffer[..]);
    let entry = unsafe {
        std::mem::transmute::<RawEntryBuffer, RawEntry>(buffer)
    };
    return entry;
}

/// Read a raw utmpx struct
pub fn read_raw2<R: Read>(reader: &mut R) -> RawEntry {
    let mut entry: RawEntry = unsafe { std::mem::zeroed() };
    unsafe {
        let entry_ptr = std::mem::transmute::<&mut RawEntry, *mut u8>(&mut entry);
        let entry_slice = std::slice::from_raw_parts_mut(entry_ptr, RawEntrySize);
        reader.read_exact(entry_slice);
    }
    return entry;
}

Note: After more testing, it appears read_raw0 doesn't work. I believe this is because transmute creates a new buffer instead of referencing the struct.


Solution

  • This is what I came up with, which I imagine should be about as fast as it gets to read a single entry. It follows the spirit of your last entry, but avoids the transmute (Transmuting &mut T to *mut u8 can be done with two casts: t as *mut T as *mut u8). Also it uses MaybeUninit instead of zeroed to be a bit more explicit (The assembly is likely the same once optimized). Lastly, the function will be unsafe either way, so we may as well mark it as such and do away with the unsafe blocks.

    use std::io::{self, Read};
    use std::slice::from_raw_parts_mut;
    use std::mem::{MaybeUninit, size_of};
    
    pub unsafe fn read_raw_struct<R: Read, T: Sized>(src: &mut R) -> io::Result<T> {
        let mut buffer = MaybeUninit::uninit();
        let buffer_slice = from_raw_parts_mut(buffer.as_mut_ptr() as *mut u8, size_of::<T>());
        
        src.read_exact(buffer_slice)?;
        Ok(buffer.assume_init())
    }