Search code examples
benchmarkdotnet.net-4.8

How to measure string interning?


I'm trying to measure the impact of string interning in an application.

I came up with this:

class Program
{
    static void Main(string[] args)
    {
        _ = BenchmarkRunner.Run<Benchmark>();
    }
}

[MemoryDiagnoser]
public class Benchmark
{
    [Params(10000, 100000, 1000000)]
    public int Count { get; set; }

    [Benchmark]
    public string[] NotInterned()
    {
        var a = new string[this.Count];
        for (var i = this.Count; i-- > 0;)
        {
            a[i] = GetString(i);
        }
        return a;
    }

    [Benchmark]
    public string[] Interned()
    {
        var a = new string[this.Count];
        for (var i = this.Count; i-- > 0;)
        {
            a[i] = string.Intern(GetString(i));
        }
        return a;
    }

    private static string GetString(int i)
    {
        var result = (i % 10).ToString();
        return result;
    }
}

But I always end up with the same amount of allocated.

Is there any other measure or diagnostic that gives me the memory savings of using string.Intern()?


Solution

  • The main question here is what kind of impact do you want to measure? To be more specific: what are your target metrics? Here are some examples: performance metrics, memory traffic, memory footprint.

    In the BenchmarkDotNet Allocated column, you get the memory traffic. string.Intern doesn't help to optimize it in your example, each (i % 10).ToString() call will allocate a new string. Thus, it's expected that BenchmarkDotNet shows the same numbers in the Allocated column.

    However, string.Intern should help you to optimize the memory footprint of your application at the end (the total managed heap size, can be fetched via GC.GetTotalMemory()). It can be verified with a simple console application without BenchmarkDotNet:

    using System;
    
    namespace ConsoleApp24
    {
        class Program
        {
            private const int Count = 100000;
            private static string[] notInterned, interned;
    
            static void Main(string[] args)
            {
                var memory1 = GC.GetTotalMemory(true);
                notInterned = NotInterned();
                var memory2 = GC.GetTotalMemory(true);
                interned = Interned();
                var memory3 = GC.GetTotalMemory(true);
                Console.WriteLine(memory2 - memory1);
                Console.WriteLine(memory3 - memory2);
                Console.WriteLine((memory2 - memory1) - (memory3 - memory2));
            }
    
            public static string[] NotInterned()
            {
                var a = new string[Count];
                for (var i = Count; i-- > 0;)
                {
                    a[i] = GetString(i);
                }
                return a;
            }
    
            public static string[] Interned()
            {
                var a = new string[Count];
                for (var i = Count; i-- > 0;)
                {
                    a[i] = string.Intern(GetString(i));
                }
                return a;
            }
    
            private static string GetString(int i)
            {
                var result = (i % 10).ToString();
                return result;
            }
        }
    }
    

    On my machine (Linux, .NET Core 3.1), I got the following results:

    802408
    800024
    2384
    

    The first number and the second number are the memory footprint impacts for both cases. It's pretty huge because the string array consumes a lot of memory to keep the references to all the string instances.

    The third number is the footprint difference between the footprint impact of interned and not-interned string. You may ask why it's so small. This can be easily explained: Stephen Toub implemented a special cache for single-digit strings in dotnet/coreclr#18383, it's described in his blog post:

    enter image description here

    So, it doesn't make sense to measure interning of the "0".."9" strings on .NET Core. We can easily modify our program to fix this problem:

    private static string GetString(int i)
    {
        var result = "x" + (i % 10).ToString();
        return result;
    }
    

    Here are the updated results:

    4002432
    800344
    3202088
    

    Now the impact difference (the third number) is pretty huge (3202088). It means that interning helped us to save 3202088 bytes in the managed heap.

    So, there are the most important recommendation for your future experiments:

    • Carefully define metrics that you actually want to measure. Don't say "I want to find all kinds of affected metrics," any changes in the source code may affect hundreds of different metrics; it's pretty hard to measure all of them in each experiment. Carefuly think about what kind of metrics are really important for you.
    • Try to take the input data that are close to your actual work scenarios. Benchmarking with some "dummy" data may leads to incorrect results because there are too many tricky optimizations in runtime that works pretty well with such "dummy" cases.