I learn iteration over a String in Rust. There are plenty methods like split_whitespace or lines, alongside with split method, that takes char, slice of chars or a closure.
Unfortunately, I cannot find any way to iterate over words. GPT and some articles suggest to use Regex, but as far as I know, it is an additional dependency. Is there any built in way to do so?
Rust uses UTF-8 strings, which are a part of the Unicode standard. Annex #29 of that standard defines rules for splitting on word boundaries, partially based on lookup tables covering all Unicode characters.
There isn't a "built in" way to access the Unicode word boundary rules in Rust. More specifically, the Rust standard library authors have decided to exclude splitting on "words" from the standard library, instead focusing on simpler functions to split on white-space (see relevant commit here and discussion here).
Regexes are a possible solution, as you know. You can also directly access the Unicode rules by importing the unicode_segmentation
crate and calling the unicode_words
function.
Regardless of which solution you choose, test it to ensure it conforms to your expectations about the definition of words and word boundaries.