Search code examples
stringsmalltalkpharovisualworksdolphin-smalltalk

Indices of a substring in Smalltalk


It seems Smalltalk implementations misses an algorithm which return all the indices of a substring in a String. The most similar ones returns only one index of an element, for example : firstIndexesOf:in: , findSubstring:, findAnySubstring: variants.

There are implementations in Ruby but the first one relies on a Ruby hack, the second one does not work ignoring overlapping Strings and the last one uses an Enumerator class which I don't know how to translate to Smalltalk. I wonder if this Python implementation is the best path to start since considers both cases, overlapping or not and does not uses regular expressions.

My goal is to find a package or method which provides the following behavior:

'ABDCDEFBDAC' indicesOf: 'BD'. "#(2 8)"

When overlapping is considered:

'nnnn' indicesOf: 'nn' overlapping: true. "#(0 2)"

When overlapping is not considered:

'nnnn' indicesOf 'nn' overlapping: false. "#(0 1 2)"

In Pharo, when a text is selected in a Playground, a scanner detects the substring and highlights matches. However I couldn't find a String implementation of this.

My best effort so far results in this implementation in String (Pharo 6):

indicesOfSubstring: subString
  | indices i |

  indices := OrderedCollection new: self size.
  i := 0.
  [ (i := self findString: subString startingAt: i + 1) > 0 ] whileTrue: [
    indices addLast: i ].
  ^ indices

Solution

  • Let me firstly clarify that Smalltalk collections are 1-based, not 0-based. Therefore your examples should read

    'nnnn' indexesOf: 'nn' overlapping: false. "#(1 3)"
    'nnnn' indexesOf: 'nn' overlapping: true. "#(1 2 3)"
    

    Note that I've also taken notice of @lurker's observation (and have tweaked the selector too).

    Now, starting from your code I would change it as follows:

    indexesOfSubstring: subString overlapping: aBoolean
      | n indexes i |
      n := subString size.
      indexes := OrderedCollection new.                            "removed the size"
      i := 1.                                                      "1-based"
      [
        i := self findString: subString startingAt: i.             "split condition"
        i > 0]
      whileTrue: [
        indexes add: i.                                            "add: = addLast:"
        i := aBoolean ifTrue: [i + 1] ifFalse: [i + n]].           "new!"
      ^indexes
    

    Make sure you write some few unit tests (and don't forget to exercise the border cases!)