I am trying to write a function that subString : string * string -> int that checks if the first string is a substring of the second and its case sensitive.
I want to return the index starting from 0 if the first string is a substring or -1 if it is not. if it appears multiple times just return the index of the first appearance.
for instance:
subString("bc","abcabc") ===>1
subString("aaa","aaaa") ===>0
subString("bc","ABC") ===>-1
I am having a lot of trouble wrapping my brain around this because I am not too familiar with sml or using strings in sml and I am not supposed to use any built in functions like String.sub.
I can use helper functions though.
all I can think of is to use explode somehow in a helper function and somehow check the lists and then implode them, but how do I get the indexed position?
all I have is
fun subString(s1,s2) =
if null s2 then ~1
else if s1 = s2 then 0
else 1+subString(s1, tl s2);
I am thinking of using a helper function that explodes the strings and then maybe compares the two but I can't figure how to get that to work.
I am not supposed to use any built in functions like
String.sub
What a pity! Since strings have an abstract interface while you with lists have direct access to its primary constructors, []
and ::
, you have to use library functions to get anywhere with strings. explode
is also a library function. But okay, if your constraint is that you have to convert your string into a list to solve the exercise, so be it.
Given your current code,
fun subString(s1,s2) = if null s2 then ~1 else if s1 = s2 then 0 else 1+subString(s1, tl s2);
I sense one problem here:
subString ([#"b",#"c"], [#"a",#"b",#"c",#"d"])
~> if null ([#"a",#"b",#"c",#"d"]) then ... else
if [#"b",#"c"] = [#"a",#"b",#"c",#"d"] then ... else
1 + subString([#"b",#"c"], [#"b",#"c",#"d"])
~> 1 + subString([#"b",#"c"], [#"b",#"c",#"d"])
~> 1 + if null ([#"b",#"c",#"d"]) then ... else
if [#"b",#"c"] = [#"b",#"c",#"d"] then ... else
1 + subString([#"b",#"c"], [#"c",#"d"])
It seems that the check s1 = s2
is not exactly enough: We should have liked to say that [#"b",#"c"]
is a substring of [#"b",#"c",#"d"]
because it's a prefix of it, not because it is equivalent. With s1 = s2
you end up checking that something is a valid suffix, not a valid substring. So you need to change s1 = s2
into something smarter.
Perhaps you can build a helper function that determines if one list is a prefix of another and use that here?
As for solving this exercise by explode
ing your strings into lists: This is highly inefficient, so much that Standard ML's sister language Ocaml had explode
entirely removed from the library:
The functions
explode
andimplode
were in older versions of Caml, but we omitted them from OCaml because they encourage inefficient code. It is generally a bad idea to treat a string as a list of characters, and seeing it as an array of characters is a much better fit to the actual implementation.
So first off, String.isSubstring
already exists, so this is a solved problem. But if it weren't, and one wanted to write this compositionally, and String.sub
isn't cheating (it is accessing a character in a string, comparable to pattern matching the head and tail of a list via x::xs
), then let me encourage you to write efficient, composable and functional code:
(* Check that a predicate holds for all (c, i) of s, where
* s is a string, c is every character in that string, and
* i is the position of c in s. *)
fun alli s p =
let val stop = String.size s
fun go i = i = stop orelse p (String.sub (s, i), i) andalso go (i + 1)
in go 0 end
(* needle is a prefix of haystack from the start'th index *)
fun isPrefixFrom (needle, haystack, start) =
String.size needle + start <= String.size haystack andalso
alli needle (fn (c, i) => String.sub (haystack, i + start) = c)
(* needle is a prefix of haystack if it is from the 0th index *)
fun isPrefix (needle, haystack) =
isPrefixFrom (needle, haystack, 0)
(* needle is a substring of haystack if is a prefix from any index *)
fun isSubstring (needle, haystack) =
let fun go i =
String.size needle + i <= String.size haystack andalso
(isPrefixFrom (needle, haystack, i) orelse go (i + 1))
in go 0 end
The general idea here, which you can re-use when building an isSubstring
that uses list recursion rather than string index recursion, is to build the algorithm abstractly: needle
being a substring of haystack
can be defined in simpler terms by needle
being the prefix of haystack
counting from any valid position in haystack
(of course not such that it exceeds haystack
). And determining if something is a prefix is much easier, even easier with list recursion!
This suggestion would leave you with a template,
fun isPrefix ([], _) = ...
| isPrefix (_, []) = ...
| isPrefix (x::xs, y::ys) = ...
fun isSubstring ([], _) = ...
| isSubstring (xs, ys) = ... isPrefix ... orelse ...
As for optimizing the string index recursive solution, you could avoid the double bounds checking in both isPrefixFrom
and in isSubstring
by making isPrefixFrom
a local function only accessible to isPrefix
and isSubstring
; otherwise it will be unsafe.
Testing this,
- isSubstring ("bc", "bc");
> val it = true : bool
- isSubstring ("bc", "bcd");
> val it = true : bool
- isSubstring ("bc", "abc");
> val it = true : bool
- isSubstring ("bc", "abcd");
> val it = true : bool
- isSubstring ("bc", "");
> val it = false : bool