During profiling I came across a function that was taking quite a bit of time, but essentially boiled down to this very simple piece of code:
function GetSubstring(AInput: PChar; AStart, ASubstringLength: Integer): string;
begin
Result := Copy(AInput, AStart, ASubstringLength);
end;
This function returns the expected sub-string, but it doesn't scale very well for longer inputs. I looked at the assembler code in CPU view, and from what I can tell (I'm not normally working at the assembler level), it seems that AInput
is implicitly converted into a string before calling Copy
.
But since at this point the length of the string/character array is unknown, the conversion code has to walk the length of the PChar
until it finds the null terminator. This would explain the terrible scaling for longer inputs.
However, since the caller passes in the length of the PChar
, I initially thought could just convert the method to use SetString
instead.
function GetSubstring(AInput: PChar; AStart, ASubstringLength: Integer): string;
begin
SetString(Result, AInput + AStart - 1, ASubstringLength);
end;
In addition to SetString
working zero-based (not one-based as Copy), there seem to be a number of other little things Copy
does in terms of validating its inputs, not all of which are documented (e.g. any start value less than 1 is changed to 1). So the naïve implementation above doesn't always work as the original one.
My goal is to replicate the Copy
routine as much as possible, as this function is part of a library and has been used extensively by my colleagues.
I'm wondering whether the following implementation accomplishes that, or whether I need to be aware of any other caveats of Copy
. NB: FLength
is the actual length of AInput
that comes from another part in the module this function is part of. I removed that other part for this example.
function GetSubstring(AInput: PChar; AStart, ASubstringLength: Integer): string;
begin
if (AInput = nil) then begin
Result := '';
end else begin
if (AStart < 1) then begin
AStart := 0;
end else begin
AStart := AStart - 1;
end;
if (ASubstringLength + AStart > FLength) then begin
ASubstringLength := FLength - AStart;
end;
SetString(Result, AInput + AStart, ASubstringLength);
end;
end;
I'm using Delphi 2006, but I assume this isn't much different in other versions of the product (at least the non-Unicode ones).
Let's consider the corner cases. I think they are:
AInput
invalid.AStart < 1
.AStart > FLength
.ASubstringLength < 0
.ASubstringLength + (AStart-1) > FLength
.We can ignore case 1 in my opinion. The onus should be on the caller to provide a valid PChar
. Indeed your check that AInput <> nil
is already a step too far in my view because nil
is not a valid PChar
.
Of the rest you have covered 2 and 5, but not 3 and 4. So if the user supplies a value of AStart
that is too large, then you will read off the end of the string. Likewise, the user can readily supply a negative ASubstringLength
. I don't think that you need anyone to write the code to check these cases since you are clearly very competent.
Now, if you really care about every last drop of performance, you should not be checking for any of these case. Demand that the user passes valid parameters. In debug mode, with {$IFOPF D+}
or Assert
you might check inputs. Of course, if these arguments come from external sources, then they should be validated.
On the other hand, the biggest performance hit the original code suffers is the needless scanning of the entire string, and the copying to an intermediate heap allocated string. Once you've removed those, as you have, then the opportunity for further performance gains is much reduced.