I wish to associate a memory cached data structure with a set of interned strings and use a passed instance of an interned string to lookup its associated data structure.
The predefined set of strings will be around 1000 in number. Cache population costs can be ignored but I want high performance lookup.
public class InternedExtras
{
public DateTime Prop1 {get; set; }
public Decimal Prop2 {get; set; }
}
Ideally I would create a Dictionary keyed on an interned string's reference but .Net does not expose object references as a specific type.
If I declare my Dictionary as:
Dictionary<string, InternedExtras>
then I am concerned that the System.String equality override will invoke char by char string value comparison during dictionary lookup, which will be inefficient.
An option would be:
Dictionary<int, InternedExtras> _extrasDictionary
InternedExtras GetInternedExtras( string knownToBeInterned )
{
return _extrasDictionary[ knownToBeInterned.GetHashCode() ];
}
However I have never fully understood hash code maths and understand uniqueness is not guaranteed.
The average length of my interned strings is 50 chars and I can deploy to the latest .Net version.
I actually think this is your most efficient option:
Dictionary<string, InternedExtras> _extrasDictionary;
Doing a looking as follows is actually very efficient!
InternedExtras extras = _extrasDictionary[interned];
The char by char comparison that you refer to will only be called on a small subset of strings. This is because interned.GetHashCode()
will be used to group they keys into "buckets".
This question has much more details on the subject: