I was recently asked to create a program to find best matches in text fragment. I have successfully written this program but I do have a question about its time complexity.
Problem is defined as following.
given a query, find occurrences of the query words in document and highlight the best tokens.
The time that my program takes
O(m + n + p)
here
m = length of the document in characters
n = length of the query in characters
p = number of total matches in the document
In this case the biggest term is always going to be "m" because in most cases documents are going to be larger then the query itself.
Can I safely deduce that time complexity of my program is O(m)?
No, you can't. According to the Big-O notation your function m
is an upper bound on the actual time your algorithm takes to run, if there's a constant M
such as the real time will always be less or equals to M*m
. Take a case where the document has size zero (an empty document) but someone queries it with a positive number of characters. The upper bound in this case will be 0
(plus a constant), but the actual time the program will take to run might be greater than that. So your program can not be said to be O(m)
.
In other words, "most cases" isn't enough: you must prove that your algorithm will perform within that upper bound in all cases.
Update: The same can be said for p
: common sense says p
is always smaller than m
, but that's only true if the search terms don't overlap. Take for instance the document aaaaaa
(m=6) and the search terms a
, aa
and aaa
(n=3). In this case, there are 6 occurences of a
, 5 of aa
and 4 of aaa
, so p = 15
. Even though it's a very unlikely scenario (same for the empty document) it's still required that you take p
into account in your complexity analysis. So your program must really be described as O(m + n + p)
as you originally stated.