If you have a DFA with m nodes and you run it on a string with n characters, at each step you have to not only test states inherited from previous step but also the first state of the DFA again. So after m character in the string (assuming m < n) worst case scenario is that you have mn actives states to test (each state need only one lookup to advance or be dismissed)
An example, consider a{l}b regular expression (all words starting with a repeated l times, follow by a b), its DFA have m = l + 1 nodes. Matching it against a string a{k}b with k>l means you will hit the worst case scenario of having (m - 1) active states after l characters in the string.
What did i miss ? Or does the literature hand wave the practical implementation to only concern itself with the theoretical question of knowing if a given full string (ie not one of it sub-string) match a regular expression.
From where i stand running an NFA or DFA will take O(nm) times (with m being number of node in NFA or DFA and n number of characters). Only thing is that NFA have more nodes than a DFA.
Historically, DFAs were first defined to match entire strings rather than to search for substrings, which is why the literature typically talks about the time complexity of a DFA with regards to taking in a single string and then returning whether the whole string matches or not. If you have a DFA that matches an entire string and you want to use it to search for substrings, then you're essentially running the DFA multiple times, once for each possible start position, which is why you're getting O(mn) as your runtime rather than O(n).
However, if your goal was to match a substring somewhere, you're likely to be better off to redesign your DFA. Imagine for example that you want to match some regex R using a DFA. Rather than building a DFA for R and running it starting at each possible location, build a DFA for the regex Σ* R Σ* . Now, if any substring of the input matches R, the whole string matches Σ* R Σ *, so you only need to run a single pass of the DFA over the string. That drops the runtime down to O(n) since you're just running a single pass.