Search code examples
c#indexingapi-designin-memory-database

Type safety of indexed columns in an in-memory database


Root of the problem: I want to design a database entity member indexing API that does not require repetition in the model definition and maintains a reasonable level of type safety.

Drawn-out explanation: I have a basic in-memory database of people. Each person has a full name and the key of their favorite celebrity in the table.

public class IMDB
{
    public Dictionary<int, Person> people;
}

public class Person
{
    public string firstname;
    public string lastname;
    public int favoriteCelebrityID;
}

Now celebrities want to be able to quickly find their fans through favoriteCelebrityID. An index is obvious, and thinking to the future, I made this interface:

public class IMDB
{
    public Dictionary<int, Person> people;

    private Dictionary<object, Dictionary<int, int>> _rowContents;
    private Dictionary<object, Dictionary<int, List<int>>> _rowIndex;

    // Returns the ID field in the given row.
    public int RelatedID(int rowID, object field) { ... }

    // Returns the ID of each rows with the given key in its field.
    public List<int> IDRelatedIDs(int key, object field) { ... }

    // Setter for the field. Maintains the index.
    public void SetRelatedID(int rowID, int key, object field) { ... }
}

public class Person
{
    public enum IndexedFields
    {
        FavoriteCelebrityID
    }

    public string firstName;
    public string lastName;
}

The benefits I saw:

  • Static type safety at the API level, aside from passing in object.
  • Easy to make an enum for indexed keys in any new tables.
  • No field names repeated in the Person model.

But now I need to index the string lastName too, so I can find families. This is where I'm stuck. My best idea is to implement a triplet of index methods for each data type:

public class IMDB
{
    ...
    public int IDRelatedID(int rowID, object field) { ... }
    public List<int> IDRelatedIDs(int key, object field) { ... }
    public void SetRelatedID(int rowID, int key, object field) { ... }

    public string IDRelatedString(int rowID, object field) { ... }
    public List<int> StringRelatedIDs(string key, object field) { ... }
    public void SetRelatedString(int rowID, string key, object field) { ... }
}

public class Person
{
    public enum IndexedIDs
    {
        FavoriteCelebrityID
    }

    public enum IndexedStrings
    {
        LastName
    }

    public string firstName;
}

But now there's the potential to accidentally put an IndexedStrings entry into IDRelatedID, which can only be detected at runtime.

Is there a way to expose a member indexing API that:

  • Does not require repetition in the model definitions?
  • Maintains a reasonable level of type safety?

I'm wary of proxy objects mostly due to efficiency concerns, but resources that explain how to make an efficient proxy would be great!


Solution

  • I may have figured out a good solution using generics:

    public class IMDB
    {
        public Dictionary<int, Person> people;
    
        public T RelatedID<T>(int id, IndexedMember<T> field) { ... }
        public List<T> IDRelatedIDs<T>(int key, IndexedMember<T> field) { ... }
        public void SetRelatedID<T>(int id, T newVal, IndexedMember<T> field) { ... }
    }
    
    public class Person
    {
        // "new IndexedMember<int>()" to initialize before first access, or
        // init upon startup (using some other manifest already maintained)
        // for less verbosity.
        public static IndexedMember<int> favoriteCelebrityID;
        public static IndexedMember<string> lastName;
        public string firstName;
    }
    
    public class IndexedMember<T>
    {
        // ...
    }
    

    If I can implement this I'll be very satisfied with it, but any improvements or alternate solutions are always welcome.