I am totally new to scala and I am having trouble understanding how I can use functions like map()
or foreach()
to perform operations on strings.
In particular, I am trying to extract all unique contiguous substrings of length k from a string (called k-shingles). My function kshingles(s: String, k: Int)
called on the string "abcdab"
should return Set("ab", "bc", "cd", "da")
.
How can I achieve that in scala? A bonus would be to do it in a way that it can be parallelized (e.g. using Spark)
sliding
is the method you are looking for. From sliding
documentation:
Groups elements in fixed size blocks by passing a "sliding window" over them (as opposed to partitioning them, as is done in
grouped
.) The "sliding window" step is set to one.
For example "abcdab".sliding(2).toSet
will provide the result you are looking for.
In Scala 2.13 String.sliding
is deprecated. The correct solution at Scala 2.13 will be:
"abcdab".toSeq.sliding(2).map(_.unwrap).toSet