Search code examples
regexparsingrustsplit

split a string by commas, but commas in a token


Let's say I have this:

something,"another thing"

This can be split easily with a normal split function.

Now I want to have more complicated syntax and I do:

something,"in a string, oooh",rgba(4,2,0)

This does not work with a regular split function.

I tried using things like replacing commas inside of specific types of tokens, but that became too over-complicated and I feel there has to be a better way.

Then I tried with regular expressions, which worked, until I had to add a new feature, which wouldn't work with the regexp I had (which was pretty bad), also regexp matches can be slow, and this is supposed to be as fast as possible.

What would be a better way to solve this?

Here is the source repo for extra context https://github.com/hyprland-community/hyprparse And the format in question is the hyprland config format


Solution

  • Iterate over the string keeping a context state:

    1. None
    2. Inside a "..."
    3. Inside a (...)

    Inside a context, comma has no separator meaning.

    Limitations: This is a midnight hack!

    See also Rust Playground

    fn split(s: String) -> Vec<String> {
        let mut context = None;
        let mut i = 0;
        let mut start = 0;
        let mut items = Vec::new();
        
        for c in s.chars() {
            if context == Some('"') {
                if c == '"' {
                    context = None;
                }
                i = i+1;
                continue;
            } else if context == Some('(') {
                if c == ')' {
                    context = None;
                }
                i = i+1;
                continue;
            }
            
            if c == '"' || c == '(' {
                context = Some(c);
            }
            
            if c == ',' && context.is_none() {
                items.push(s[start..i].to_string());
                start = i + 1;
            }
            
            i = i+1;
        }
        items.push(s[start..i].to_string());
        items
    }
    
    
    fn main() {
        let s = "something,\"in a string, oooh\",rgba(4,2,0)".to_string();
        println!("{:?}", split(s));
        // -> ["something", "\"in a string, oooh\"", "rgba(4,2,0)"]
    }