Search code examples
splitrustcarriage-return

How can I split a stream on either carriage return (\r) or CRLF (\r\n) line terminators?


I'm trying to split an odd serial port stream that separates lines with carriage-return \r and sometimes \r\n. BufReader has the lines function, but it only splits on \n or \r\n. There is a .read_until(...) function, but it only works for a single terminator.

Based on the standard library's implementation, I've started to cobble together some bits, but I haven't gotten it to compile yet. I hope I'm doing this right the "Rust way". Regular expressions seem too expensive for a byte stream.

Example input:

Heading:\r\nLine 1\rLine 2\rLine 3\r\nEnd

When you use lines() on that input, you get three lines because \r is not considered a line terminator:

Heading:
Line 1\rLine 2\rLine 2\rLine 3
End

Solution

  • Based on my previous answer on github to match your need:

    use std::io::{BufRead, BufReader};
    use std::str;
    
    #[derive(Debug)]
    pub struct MyLines<B> {
        buffer: B,
    }
    
    #[derive(Debug)]
    pub enum MyError {
        Io(std::io::Error),
        Utf8(std::str::Utf8Error),
    }
    
    impl<B> MyLines<B> {
        pub fn new(buffer: B) -> Self {
            Self { buffer }
        }
    }
    
    impl<B: BufRead> Iterator for MyLines<B> {
        type Item = Result<String, MyError>;
    
        fn next(&mut self) -> Option<Self::Item> {
            let (line, total) = {
                let buffer = match self.buffer.fill_buf() {
                    Ok(buffer) => buffer,
                    Err(e) => return Some(Err(MyError::Io(e))),
                };
                if buffer.is_empty() {
                    return None;
                }
                let consumed = buffer
                    .iter()
                    .take_while(|c| **c != b'\n' && **c != b'\r')
                    .count();
                let total = consumed
                    + if consumed < buffer.len() {
                        // we found a delimiter
                        if consumed + 1 < buffer.len() // we look if we found two delimiter
                        && buffer[consumed] == b'\r'
                        && buffer[consumed + 1] == b'\n'
                        {
                            2
                        } else {
                            1
                        }
                    } else {
                        0
                    };
                let line = match str::from_utf8(&buffer[..consumed]) {
                    Ok(line) => line.to_string(),
                    Err(e) => return Some(Err(MyError::Utf8(e))),
                };
                (line, total)
            };
            self.buffer.consume(total);
    
            Some(Ok(line))
        }
    }
    
    fn main() {
        let f = BufReader::new("Heading:\r\nLine 1\rLine 2\rLine 3\r\nEnd".as_bytes());
    
        for line in MyLines::new(f) {
            println!("{:?}", line);
        }
    }
    

    Output:

    Ok("Heading:")
    Ok("Line 1")
    Ok("Line 2")
    Ok("Line 3")
    Ok("End")