My goal is simply replace the substring, but very frequently. The program runs in Android.
Such as I have a string = {a} is a good {b}.
with a map={{a}=Bob, {b}=boy}
, and the result should be Bob is a good boy.
I need to deal with such replacement for different string up to 400 times peer second because the value of map will update real time.
However I use trie tree and Aho-Corasick automaton for high perfromance, here's the core fragment:
val builder: StringBuilder
private fun replace(str: String): String {
if (!getFail) {
getFail()
}
var p = 1
builder.setLength(0)
for (c in str) {
builder.append(c)
if (c.toInt() !in 0..126) {
continue // ignore non-ascii char
}
var k = trie[p][c.toInt()]
while (k > 1) {
// find a tag
if (end[k] != 0) {
val last = builder.length - end[k]
// replace the tag
values[builder.sub(last, end[k])]?.let {
builder.replace1(last, end[k], it)
}
p = 0
break
}
k = fail[k] // not find
}
p = trie[p][c.toInt()]
}
return builder.toString()
}
As you can see I have used StringBuilder
to reused memory, but finally I have to call StringBuilder.toString()
to return the result and this operation create a new string object. Meanwhile the result's lifecycle is very short and the replacement function is called very frequently. As a result JVM will GC frequently.
Any way to reuse the memory occupied by the short life result string? Or just some other solution.
Any way to reuse the memory occupied by the short life result string?
No.
Or just some other solution.
If you could change the code that uses the String
objects generated by this method to accept a CharSequence
instead. Then you could pass it the StringBuilder
instance in builder
, and avoid the toString()
call.
The problem is that you wouldn't be able to prevent something from casting CharSequence
to StringBuilder
and mutating it. (But if the code is not security critical, you could ignore that. It would be hard to do that by accident, especially if you use the CharSequence
interface type when passing the StringBuilder
around.)
The other problem is that the caller will actually be getting the same object each the time with different state. It wouldn't be able to keep the state ... unless it called toString()
on it.
But you may be worrying unnecessarily about performance. The GC is relatively good at dealing with short-lived objects. Assuming that an object is unreachable on the first GC cycle after it is created, it won't ever be marked or copied, and the cost of deleting it will be zero. To a first approximation, it is the reachable objects in the "from" space that will cost you.
I would first do some profiling and GC monitoring. Only go down the path of changing your code as above if there is clear evidence that the short lived strings are causing a performance problem.
(My intuition is that 400 short term strings per second should not be a problem, assuming that 1) they are not huge and 2) you picked a GC that is suitable for your use-case.)