I'm trying to measure the impact of string interning in an application.
I came up with this:
class Program
{
static void Main(string[] args)
{
_ = BenchmarkRunner.Run<Benchmark>();
}
}
[MemoryDiagnoser]
public class Benchmark
{
[Params(10000, 100000, 1000000)]
public int Count { get; set; }
[Benchmark]
public string[] NotInterned()
{
var a = new string[this.Count];
for (var i = this.Count; i-- > 0;)
{
a[i] = GetString(i);
}
return a;
}
[Benchmark]
public string[] Interned()
{
var a = new string[this.Count];
for (var i = this.Count; i-- > 0;)
{
a[i] = string.Intern(GetString(i));
}
return a;
}
private static string GetString(int i)
{
var result = (i % 10).ToString();
return result;
}
}
But I always end up with the same amount of allocated.
Is there any other measure or diagnostic that gives me the memory savings of using string.Intern()?
The main question here is what kind of impact do you want to measure? To be more specific: what are your target metrics? Here are some examples: performance metrics, memory traffic, memory footprint.
In the BenchmarkDotNet Allocated column, you get the memory traffic. string.Intern
doesn't help to optimize it in your example, each (i % 10).ToString()
call will allocate a new string. Thus, it's expected that BenchmarkDotNet shows the same numbers in the Allocated column.
However, string.Intern
should help you to optimize the memory footprint of your application at the end (the total managed heap size, can be fetched via GC.GetTotalMemory()
). It can be verified with a simple console application without BenchmarkDotNet:
using System;
namespace ConsoleApp24
{
class Program
{
private const int Count = 100000;
private static string[] notInterned, interned;
static void Main(string[] args)
{
var memory1 = GC.GetTotalMemory(true);
notInterned = NotInterned();
var memory2 = GC.GetTotalMemory(true);
interned = Interned();
var memory3 = GC.GetTotalMemory(true);
Console.WriteLine(memory2 - memory1);
Console.WriteLine(memory3 - memory2);
Console.WriteLine((memory2 - memory1) - (memory3 - memory2));
}
public static string[] NotInterned()
{
var a = new string[Count];
for (var i = Count; i-- > 0;)
{
a[i] = GetString(i);
}
return a;
}
public static string[] Interned()
{
var a = new string[Count];
for (var i = Count; i-- > 0;)
{
a[i] = string.Intern(GetString(i));
}
return a;
}
private static string GetString(int i)
{
var result = (i % 10).ToString();
return result;
}
}
}
On my machine (Linux, .NET Core 3.1), I got the following results:
802408
800024
2384
The first number and the second number are the memory footprint impacts for both cases. It's pretty huge because the string array consumes a lot of memory to keep the references to all the string instances.
The third number is the footprint difference between the footprint impact of interned and not-interned string. You may ask why it's so small. This can be easily explained: Stephen Toub implemented a special cache for single-digit strings in dotnet/coreclr#18383, it's described in his blog post:
So, it doesn't make sense to measure interning of the "0"
.."9"
strings on .NET Core. We can easily modify our program to fix this problem:
private static string GetString(int i)
{
var result = "x" + (i % 10).ToString();
return result;
}
Here are the updated results:
4002432
800344
3202088
Now the impact difference (the third number) is pretty huge (3202088). It means that interning helped us to save 3202088 bytes in the managed heap.
So, there are the most important recommendation for your future experiments: