Search code examples
rustmemorydllhashtableglobal

How to have a key (string) not freed for global and mutable HashTable in a DLL with Rust?


I'm trying to implement a cross-platform DLL/SharedObject written in Rust. I have a global and mutable HashTable initialized in this library. When i add a key and value in this HashTable, the key (a &'static str) is freed, reset with \0 characters and reassigned to another use. I don't understand this behaviour because i declare the variable as &'static str, so Rust should not free the memory until the process finishes. Can you explain the behavior to me ? Do you know the right way to solve the problem ?

I don't have this behaviour in executable, only in DLL.

Code Rust DLL/SharedObject:

use std::{sync::Mutex, collections::HashMap};
use lazy_static::lazy_static;
use std::io::{stdout, Write};

#[derive(Clone)]
enum Color {
    Red,
}

impl Color {
    fn value(&self) -> i32 {
        match *self {
            Color::Red    => 1,
        }
    }
}

lazy_static! {
    static ref STATE_OK: State = {
        State {
            name: String::from("OK"),
            color: Color::Green,
            character: "+".to_string(),
        }
    };
    static ref STATES: Mutex<HashMap<&'static str, Box<dyn _State + Send>>> = {
        let _states: HashMap<&'static str, Box<dyn _State + Send>> = HashMap::from(
            [
                (
                    "OK",
                    Box::new(STATE_OK.clone()) as Box<dyn _State + Send>
                )
            ]
        );
        Mutex::new(_states)
    };
}

fn rust_from_c_string (string_c: *const u8) -> &'static str {
    let length = strlen(string_c);
    let slice = unsafe { std::slice::from_raw_parts(string_c, length) };
    std::str::from_utf8(slice).unwrap()
}

fn strlen(string_c: *const u8) -> usize {
    let mut length = 0;
    unsafe {
        while *string_c.add(length) != 0 {
            length += 1;
        }
    }
    length
}

#[no_mangle]
pub extern "C" fn add_state (key_: *const u8, character_: *const u8, color_: *const u8) {
    let key: &'static str = rust_from_c_string(key_);
    let color = rust_from_c_string(color_);
    let character = rust_from_c_string(character_);

    let mut _states = STATES.lock().unwrap();
    _states.insert(
        key,
        Box::new(State {
            name: String::from(key),
            color: match color {
                "red"    => Color::Red,
            },
            character: String::from(character),
        }) as Box<dyn _State + Send>
    );
}

#[no_mangle]
pub extern "C" fn messagef (text_: *const u8, state_: *const u8) {
    let text = rust_from_c_string(text_);
    let state = rust_from_c_string(state_);
    let to_print: String;
    let _states = STATES.lock().unwrap();
    let state = _states.get(&*state.unwrap_or("OK").to_string()).unwrap();
    print!("{}", to_print);
    let _ = stdout().flush();
}

#[no_mangle]
pub extern "C" fn print_all_state () {
    DEFAULT_STATE.lock().unwrap().print("not found");

    let _states = STATES.lock().unwrap();

    for (key, state) in _states.iter() {
        state.print(key);
    }
}

Call it from python:

from ctypes import c_char_p, c_ubyte, c_ushort, pointer
from os.path import join, dirname, exists
from os import name, getcwd

if name == "nt":
    from ctypes import windll as sysdll

    filename = "TerminalMessages.dll"
else:
    from ctypes import cdll as sysdll

    filename = "libTerminalMessages.so"

filenames = (join(dirname(__file__), filename), join(getcwd(), filename))
for filename in filenames:
    if exists(filename):
        break
else:
    raise FileNotFoundError(f"Library {filename!r} is missing")

lib = sysdll.LoadLibrary(filename)

lib.print_all_state()

lib.add_state(
    c_char_p("TEST".encode()),
    c_char_p("T".encode()),
    c_char_p("red".lower().encode()),
)

lib.messagef(
    c_char_p("test".encode()),
    c_char_p("TEST".encode()),
)

lib.print_all_state()

The full source code is here.

I take a screenshot where:

  1. i add 2 keys and values named TEST and TEST2
  2. i list all keys and values, TEST and TEST2 are present with the good key value
  3. i use my library and the HashTable with initialized keys, this is working good
  4. i list all keys and values, TEST2 value is present with key \0\0\0\0\0 (5 \0 characters) and TEST value is present with key Ques (a part of Question word used in precedent message)
  5. i use my library and the HashTable with TEST, TEST2 and initialized keys. TEST, TEST2 are not found but initialized keys are found.

ScreenShot.


Solution

  • You are incorrectly handling the Python strings: c_char_p("TEST".encode()) does not have a 'static lifetime.

    I don't understand this behaviour because i declare the variable as &'static str, so Rust should not free the memory until the process finishes.

    Rust is not in control of the memory passed to it from a different language, and annotating it as 'static doesn't change anything. Rust's lifetimes are descriptive and not prescriptive and you are describing the lifetime incorrectly. Part of the safety requirements of from_raw_parts is that the deduced lifetime is not longer than the underlying object is alive. If you did guarantee the Python string would persist indefinitely then that is one thing, but you aren't.

    Rust is not free-ing the memory, but Python is, and the memory that was used for the string will be reclaimed by the GC and reused (or apparently in this case cleared with zeros) while Rust still has a pointer to where it was.

    You would have to make that guarantee yourself by creating an owned String and then leak()-ing it to make a true &'static str. Something like this:

    fn rust_from_c_string (string_c: *const u8) -> &'static str {
        let length = strlen(string_c);
        let slice = unsafe { std::slice::from_raw_parts(string_c, length) };
        let str_r = std::str::from_utf8(slice).unwrap();
        let string_r = str_r.to_owned();
        String::leak(string_r)
    }
    

    Although at that point you should just make your HashMap store Strings from the start (or at least Cow<str, 'static>) to avoid leaking if your API needs to clear the data.