Search code examples
parsingrustnom

Parsing camel case strings with nom


I want to parse a string like "ParseThis" or "parseThis" into a vector of strings like ["Parse", "This"] or ["parse", "this"] using the nom crate.

All attempts I've tried do not return the expected result. It's possible that I don't understand yet how to use all the functions in nom.

I tried:

named!(camel_case<(&str)>, 
       map_res!(
           take_till!(is_not_uppercase),
           std::str::from_utf8));

named!(p_camel_case<&[u8], Vec<&str>>,
       many0!(camel_case));

But p_camel_case just returns a Error(Many0) for parsing a string that starts with an uppercase letter and for parsing a string that starts with a lowercase letter it returns Done but with an empty string as a result.

How can I tell nom that I want to parse the string, separated by uppercase letters (given there can be a first uppercase or lowercase letter)?


Solution

  • You are looking for things that start with any character, followed by a number of non-uppercase letters. As a regex, that would look akin to .[a-z]*. Translated directly to nom, that's something like:

    #[macro_use]
    extern crate nom;
    
    use nom::anychar;
    
    fn is_uppercase(a: u8) -> bool { (a as char).is_uppercase() }
    
    named!(char_and_more_char<()>, do_parse!(
        anychar >>
        take_till!(is_uppercase) >>
        ()
    ));
    
    named!(camel_case<(&str)>, map_res!(recognize!(char_and_more_char), std::str::from_utf8));
    
    named!(p_camel_case<&[u8], Vec<&str>>, many0!(camel_case));
    
    fn main() {
        println!("{:?}", p_camel_case(b"helloWorld"));
        // Done([], ["hello", "World"])
    
        println!("{:?}", p_camel_case(b"HelloWorld"));
        // Done([], ["Hello", "World"])
    }
    

    Of course, you probably need to be careful about actually matching proper non-ASCII bytes, but you should be able to extend this in a straight-forward manner.