Search code examples
rustbinary-search

Searching a String into Vec<String> in rust


I'm writing a program that interprets a language.

I need to search for a string (not known at compile time) in a Vec.

fn get_name_index(name: &String, array: &Vec<String>) -> usize {
    match array.binary_search(name) {
        Ok(index) => index,
        Err(_) => {
            eprintln!("Error : variable {:?} not found in name array", name);
            std::process::exit(1)
        }
    }
}

This happens multiple times during execution, but at the moment, the array.binary_search() function does not return the right answer.

I searched for the error, but my array is what it should be (printing each element, or examining with gdb: the same), and the error is still there.

Is there any other way to search for a String in a Vec<String>? Or is there an error in my code?

Thanks


Solution

  • First, a few issues: data must be sorted before using a binary search. A binary search is a fast search algorithm (O(log n), or scales as the log of the size of the container), much faster than a linear search (O(n), or scales linear to the size of the container). However, any speed improvements from a binary search are dwarfed by the overhead of sorting the container (O(n log n)).

    Single Search

    Therefore, the best approach depends on how often you search your container. If you are only going to check it a few times, you should use a linear search, as follows:

    fn get_name_index(name: &String, array: &Vec<String>) -> Option<usize> {
        array.iter().position(|&&x| x == name)
    }
    

    Repeated Searches

    If you are going to repeatedly call get_name_index, you should use a binary search (or possibly even better, below):

    // Sort the array before using
    array.sort_unstable();
    
    // Repeatedly call this function
    fn get_name_index(name: &String, array: &Vec<String>) -> Option<usize> {
        match array.binary_search(name) {
            Ok(index) => Some(index),
            Err(_)    => None,
        }
    }
    

    However, this may be suboptimal for some cases. A few considerations: a HashSet may be faster for certain sets of data (O(1) complexity at its best). However, this is slightly misleading, since all the characters of the name must be processed on each compare for a HashSet, while generally only a few characters must be compared to determine whether to jump left or right for a binary search. For data that is highly uniform and mostly differs with a few characters at the end, a HashSet might be better, otherwise, I'd generally recommend using binary_search on the vector.