Search code examples
rubystringalgorithmfull-text-searchsuffix-array

Suffix array and search a substring in a string


I found an implementation of suffix array in Ruby and changed it a bit. Here is what I have:

class SuffixArray
    def initialize(str)
        @string = str
        @suffix_array = []
        (0...str.length).each do |i|
            substring = @string[i...str.length]
            @suffix_array << {:suffix=>substring, :index => i}
        end

        @sorted_suffix_array = @suffix_array.sort {|x,y| x[:suffix] <=> y[:suffix]}
    end

    def print_sorted
      @sorted_suffix_array.each {|item| puts "#{item[:index]}=>#{item[:suffix]}"}
      puts "total=>#{@sorted_suffix_array.size()}"
    end

    def print_unsorted
      @suffix_array.each {|item| puts "#{item[:index]}=>#{item[:suffix]}"}
      puts "total=>#{@suffix_array.size()}"
    end

    def find_substring(substring)
        low = 0
        high = @sorted_suffix_array.length
        while(low <= high) do
            mid = (low + high) / 2
            comparison = @sorted_suffix_array[mid][:suffix]#[0..substring.length]
      if comparison > substring
        high = mid - 1
      elsif comparison < substring
        low = mid + 1
      else 
        return @sorted_suffix_array[mid][:index]
      end
        end
    end

end

It works good but it doesn't find all substrings I want. For example

a = SuffixArray.new("there is a man who likes dogs")
puts a.find_substring("man") #won't be found
puts a.find_substring("who likes dogs") #will be found
puts a.find_substring("o likes dogs") #will be found

How do I change the algorithm to make it find all the substrings I want?


Solution

  • Your code was almost correct. I made some small modifications and it works.

    def find_substring(substring)
      low = 0
      high = @sorted_suffix_array.length-1
      while(low <= high) do
        mid = (low + high) / 2
        comparison = @sorted_suffix_array[mid][:suffix][0...substring.length]
        if comparison > substring
          high = mid - 1
        elsif comparison < substring
          low = mid + 1
        else 
          return @sorted_suffix_array[mid][:index]
        end
      end
    end