Search code examples
rusttail

Equivalent of Cons Pattern from F# in Rust for Strings


I am experimenting with Rust by implementing a small F# snippet of mine.

I am at the point where I want to destructure a string of characters. Here is the F#:

 let rec internalCheck acc = function
    | w :: tail when Char.IsWhiteSpace(w) -> 
        internalCheck acc tail
    | other
    | matches
    | here

..which can be called like this: internalCheck [] "String here" where the :: operator signifies the right hand side is the "rest of the list".

So I checked the Rust documentation and there are examples for destructuring vectors like this:

let v = vec![1,2,3];

match v {
    [] => ...
    [first, second, ..rest] => ...
}

..etc. However this is now behind the slice_patterns feature gate. I tried something similar to this:

match input.chars() {
    [w, ..] => ...
}

Which informed me that feature gates require non-stable releases to use.

So I downloaded multirust and installed the latest nightly I could find (2016-01-05) and when I finally got the slice_patterns feature working ... I ran into endless errors regarding syntax and "rest" (in the above example) not being allowed.

So, is there an equivalent way to destructure a string of characters, utilizing ::-like functionality ... in Rust? Basically I want to match 1 character with a guard and use "everything else" in the expression that follows.

It is perfectly acceptable if the answer is "No, there isn't". I certainly cannot find many examples of this sort online anywhere and the slice pattern matching doesn't seem to be high on the feature list.

(I will happily delete this question if there is something I missed in the Rust documentation)


Solution

  • You can use the pattern matching with a byte slice:

    #![feature(slice_patterns)]
    
    fn internal_check(acc: &[u8]) -> bool {
        match acc {
            &[b'-', ref tail..] => internal_check(tail),
            &[ch, ref tail..] if (ch as char).is_whitespace() => internal_check(tail),
            &[] => true,
            _ => false,
        }
    }
    
    fn main() {
        for s in ["foo", "bar", "   ", " - "].iter() {
            println!("text '{}', checks? {}", s, internal_check(s.as_bytes()));
        }
    }
    

    You can use it with a char slice (where char is a Unicode Scalar Value):

    #![feature(slice_patterns)]
    
    fn internal_check(acc: &[char]) -> bool {
        match acc {
            &['-', ref tail..] => internal_check(tail),
            &[ch, ref tail..] if ch.is_whitespace() => internal_check(tail),
            &[] => true,
            _ => false,
        }
    }
    
    fn main() {
        for s in ["foo", "bar", "   ", " - "].iter() {
            println!("text '{}', checks? {}",
                     s, internal_check(&s.chars().collect::<Vec<char>>()));
        }
    }
    

    But as of now it doesn't work with a &str (producing E0308). Which I think is for the best since &str is neither here nor there, it's a byte slice under the hood but Rust tries to guarantee that it's a valid UTF-8 and tries to remind you to work with &str in terms of unicode sequences and characters rather than bytes. So to efficiently match on the &str we have to explicitly use the as_bytes method, essentially telling Rust that "we know what we're doing".

    That's my reading, anyway. If you want to dig deeper and into the source code of the Rust compiler you might start with issue 1844 and browse the commits and issues linked there.

    Basically I want to match 1 character with a guard and use "everything else" in the expression that follows.

    If you only want to match on a single character then using the chars iterator to get the characters and matching on the character itself might be better than converting the entire UTF-8 &str into a &[char] slice. For instance, with the chars iterator you don't have to allocate the memory for the characters array.

    fn internal_check(acc: &str) -> bool {
        for ch in acc.chars() {
            match ch {
                '-' => (),
                ch if ch.is_whitespace() => (),
                _ => return false,
            }
        }
        return true;
    }
    
    fn main() {
        for s in ["foo", "bar", "   ", " - "].iter() {
            println!("text '{}', checks? {}", s, internal_check(s));
        }
    }
    

    You can also use the chars iterator to split the &str on the Unicode Scalar Value boundary:

    fn internal_check(acc: &str) -> bool {
        let mut chars = acc.chars();
        match chars.next() {
            Some('-') => internal_check(chars.as_str()),
            Some(ch) if ch.is_whitespace() => internal_check(chars.as_str()),
            None => true,
            _ => false,
        }
    }
    
    fn main() {
        for s in ["foo", "bar", "   ", " - "].iter() {
            println!("text '{}', checks? {}", s, internal_check(s));
        }
    }
    

    But keep in mind that as of now Rust provides no guarantees of optimizing this tail-recursive function into a loop. (Tail call optimization would've been a welcome addition to the language but it wasn't implemented so far due to LLVM-related difficulties).