Let's say I have a string:
"ab bc cdv gf
ed aqb ahf sd
abcdef
I want to
a) Split it by ' '
and/or '\r\n'
, '\t'
b) Iterate over newly created list of these substrings, split by separators and match each of them to some criteria (for example, only choose words starting with 'a'
, aka ["ab", "ahf", "abcdef"]
Note: also we can't use Str
or any other additional libraries.
I came up with some sort of this code:
let f g =
String.split_on_char ' ' g
|> List.iter (fun x -> x);;
Obviously though, it shows an error. And even if it worked, it wouldn't have split out the '\r\n'
. Instead of List.iter
I could have used List.map (fun x -> x)
, but I will just get the split (by ' '
character only) list of substrings. So now another question: how can I use
"match (something?) with
| ..."
in this case? I see no way in adding match into the code above. Do we use the reverse |>
and List.iter
in this case or is there another way I'm not aware of?
Simple approach: let's just keep splitting on whitespace characters we want to split on, use List.concat_map
to maintain a "flat" list, and then reject empty lists.
let s = "ab bc cdv gf ed aqb ahf sd abc\r\ndef" in
let split = String.split_on_char in
s
|> split ' '
|> List.concat_map (split '\n')
|> List.concat_map (split '\r')
|> List.filter ((<>) "")
(* Result:
* ["ab"; "bc"; "cdv"; "gf"; "ed"; "aqb"; "ahf"; "sd"; "abc"; "def"]
*)
You might also use your regular expression library of choice and split on \s+
, but apparently that isn't allowed.
You could also break this out into a function using a left fold, and supply the characters to split on as a string.
let split_on delims str =
delims
|> String.to_seq
|> Seq.fold_left
(fun acc delim ->
List.concat_map (String.split_on_char delim) acc)
[str]
|> List.filter ((<>) "")
utop # split_on " \t\r\n" s;;
- : string list =
["ab"; "bc"; "cdv"; "gf"; "ed"; "aqb"; "ahf"; "sd"; "abc"; "def"]