Search code examples
rubyalgorithmoptimizationjruby

Optimised string insert algorithm


There is a small part of our software that inserts a string before and after a certain matched string on a huge chunk of code (average length 900000 characters).

Example:

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.

Replaced with

Lorem Ipsum is simply dummy text of the <span class="class1 class2">printing</span> and typesetting industry. Lorem Ipsum has <span class="class1 class2 class3">been</span> the industry's standard dummy text <span class="class1">ever since the 1500s</span>, when an unknown printer took a galley of type and scrambled it to make a type specimen book.


Ok, so far so good. We could just search and replace, but the contents are somewhat semantical relative so printing in that case was replaced, but might not be in some other place of the text. What we did was index where we wanted to replace the text, so for every replacement we get a starting position and an end position.

Current code:

new_val = huge_string_goes_here
entities.each { |entity|
    add_before = "<span class=\"#{entity.getStuff}\">"
    add_after = '</span>'

    new_val.insert(entity.getStart+increment, add_before)
    increment = increment+add_before.length
    new_val.insert(entity.getEnd+increment, add_after)
    increment = increment+add_after.length
}

Analysing a 900000 character long string is taking around 15-20 seconds.

Does anyone have any suggestions on how we could optimize it?

Thank you


Solution

  • Consider writing a C extension module for Ruby which could find the match indices for you - this sort of operation should be much faster natively than with interpreted code. Once you've got the indices you could use Ruby to insert the before/after text, or if the performance still needs a boost then consider doing it all in C.

    Note that, as with any optimization, the critical thing is to make sure your "optimization" actually improves upon the non-optimized code. Write a benchmark for some sample cases and keep track of the amount of time it takes the pure Ruby code, then run the same benchmark using your native extension and see if the performance is in fact better.