Search code examples
functional-programmingtime-complexitystring-concatenation

Time complexity of multiple string concatenation (join) in functional programming languages


Am I right that the only algorithm that can be implemented in functional programming languages like Haskell to concatenate multiple strings (i.e. implement join that transforms list of lines ["one", "two", "three"] into one line "onetwothree") has time complexity of order O(n^2), like described in this well-known post?

E.g. if I work with immutable strings, for example, in Python, and try to implement join, I'll get something like

def myjoin(list_of_strings):
   return list_of_strings[0] + myjoin(list_of_strings[1:])

Is it true that it is not possible to make it faster, for example, in Haskell?


Solution

  • First of all Haskell is lazily: this means that if you write:

    concat ["foo", "bar", "qux"]
    

    it will not perform this operation until you request for instance the first character of the outcome. In that case usually it will not concatenate all strings together, but - depending on how smart the function is implemented - aim to do the minimal amount of work necessary to obtain the first character. If you request the first character, but do not inspect it, it could be possible that you for instance got succ 'f' instead of 'g' since again Haskell is lazy.

    But let's assume that we are interested in the resulting string, and want to know every character. We can implement concat as:

    concat :: [[a]] -> [a]
    concat [] = []
    concat (x:xs) = x ++ concat xs
    

    and (++) as:

    (++) :: [a] -> [a] -> [a]
    (++) []     ys = ys
    (++) (x:xs) ys = x : (xs ++ ys)
    

    Now that means that - given (:) works in O(1) - (++) works in O(a) with a the length of the first list, and b (note that this is not in the big oh expression) the length of the second list.

    So now if we inspect concat, we see that if we enter k strings, we will perform k (++) operations. At every (++) operation, the left string is equal to the length of the string. So that means that if the sum of the lengths of the strings is n, concat is an O(n) algorithm.