Search code examples
rustiteratorhigher-order-functionsmethod-chaining

Detect duplicated elements of a string slice happening in order


I need to detect and list string characters of slice that repeat themselves in order more or equal than N times. I managed to write non-higher-order-function solution in Rust already (below), but I wonder if this can be simplified to chaining iter methods.

The idea:

let v = "1122253225";
let n = 2;

Output:

There are 2 repetition of '1'
There are 3 repetition of '2'
There are 2 repetition of '2'

Indexes where repetition happens are not important. Repetition must happen in order (ie. 3 repetition of '2' separated by other values from the other 2 repetition of '2' counts as separate output lines).

My non-iterator solution:

    let mut cur_ch = '\0';
    let mut repeat = 0;

    for ch in v.chars() {
        if ch == cur_ch {
            repeat = repeat + 1;
        }
        else {
            if repeat >= n {
                printf!("There are {} repetition of '{}'", repeat, cur_ch);
            }
            cur_ch = ch;
            repeat = 1;
        }
    }

    if repeat >= n {
        printf!("There are {} repetition of '{}'", repeat, cur_ch);
    }

It works, but is there a better way to do so with chaining iter methods?


Solution

  • Here is a solution that uses scan and filter_map:

    fn main() {
        let s = "112225322555";
        let n = 2;
    
        let i = s
            .chars()
            .map(|v| Some(v))
            .chain(std::iter::once(None))
            .scan((0, None), |(count, ch), v| match ch {
                Some(c) if *c == v => {
                    *count += 1;
                    Some((None, *count))
                }
                _ => Some((ch.replace(v), std::mem::replace(count, 1))),
            })
            .filter_map(|(ch, count)| match ch {
                Some(Some(ch)) if count >= n => Some((ch, count)),
                _ => None,
            });
    
        for (ch, num) in i {
            println!("There are {} repititions of {}", num, ch);
        }
    }
    

    Playground Link

    The first step is to use scan to count the number of adjacent characters. The first argument to scan is a state variable, which gets passed to each call of the closure that you pass as the second argument. In this case the state variable is a tuple containing the current character and the number of times it has been seen.

    Note:

    • We need to extend the iteration one beyond the end of the string we are analyzing (otherwise we would miss the case where the end of the string contained a run of characters meeting the criteria). We do this by mapping the iteration into Option<char> and then chaining on a single None. This is better than special-casing a character such as \0, so that we could even count \0 characters.

    • For the same reason, we use Option<char> as the current character within the state tuple.

    • The return value of scan is an iterator over (Option<Option<char>>, i32). The first value in the tuple will be None for each repeated character in the iterator, whereas at each boundary where the character changes it will be Some(Some(char))

    • We use replace to both return the current character and count, at the same time as setting the state tuple to new values

    The last step is to use filter_map to:

    • remove the (None, i32) variants, which indicate no change in the incoming character

    • filter out the cases where the count does not reach the limit n.