Search code examples
delphidelphi-10-seattle

Variable Type for searching both with id and string


I need to store in memory about 500-1000 entries of 3 fields with quick and effective search by both int and str values. Searching happens in quick bursts of about 300-500 requests. I'm not sure how to efficiently do it.

Stored data consists of 3 fields:

  1. ID - Integer, which won't be sequential. I.e. it will be like (3, 5, 12, 55), but not (1, 2, 3, 4, 5).
  2. Name - String.
  3. Tags - String.

There are 3 possible scenarios:

  1. Get ID by Name.
  2. Get Name by ID.
  3. Get Tags by ID.

Currently, I use two different Types:

  1. THashedStringList with keypairs '%s=%i' to search by name.
  2. Array of Records sorted by ID for other searches.

I find this highly inefficient and currently looking for new ideas. Any hints?


Solution

  • As David Heffernan suggested, you might want to use a proper database for this.

    But if you want a more lightweight solution, with excellent performance, you can use an object list to store all your items and two dictionaries that refer to these items by their IDs and names, respectively.

    As an example, consider a frog:

    type
      TFrog = class
        ID: Integer;
        Name: string;
        Address: string;
      end;
    

    Like your example, this class has one integer and two string members. We assume that every frog has a unique ID and a unique name. (But two or more frogs may share the same address.)

    Just so we will be able to test the performance, we create a primitive frog generation function:

    function CreateRandomFrog: TFrog;
    const
      FrogFirstNames: array[0..11] of string =
        ('Luke', 'Smith', 'John', 'Maggie', 'Rose', 'Bill', 'Edward', 'Harry',
         'Andrew', 'Michael', 'Molly', 'Arthur');
      FrogLastNames: array[0..7] of string =
        ('Jones', 'Stone', 'Rock', 'Hill', 'Waterfall', 'Sky', 'Flower', 'Rain');
      FrogInitials: array[0..25] of Char = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
      FrogAddressesPrefixes: array[0..3] of string =
        ('Tree', 'Swamp', 'Lawn', 'Lake');
    begin
      Result := TFrog.Create;
      try
        Result.ID := Random(10*N);
        Result.Name := FrogFirstNames[Random(Length(FrogFirstNames))] + #32 +
          FrogInitials[Random(Length(FrogInitials))] + '.' +
          FrogInitials[Random(Length(FrogInitials))] + '.' +
          FrogInitials[Random(Length(FrogInitials))] + '.' + #32 +
          FrogLastNames[Random(Length(FrogLastNames))];
        Result.Address := FrogAddressesPrefixes[Random(Length(FrogAddressesPrefixes))] +
          #32 + Random(Byte.MaxValue).ToString;
      except
        Result.Free;
        raise;
      end;
    end;
    

    This will create frogs like

    ID: 123
    Name: Bill D.H.H. Rock
    Address: Tree 52
    

    We also define a constant

    const
      N = 1000000;
    

    This is the number of frogs we will create at the same time.

    Now, some action: Define a class

    type
      TFrogFarm = class
        Frogs: TObjectList<TFrog>;
        FrogsByID: TDictionary<Integer, TFrog>;
        FrogsByName: TDictionary<string, TFrog>;
        constructor Create;
        destructor Destroy; override;
        procedure TrySearchFarm;
      end;
    

    The idea is that the Frogs list owns the frog objects, while the FrogsByID and FrogsByName dictionaries only refer to the frog objects without owning them. These are dictionaries using the IDs and the names as their keys.

    Implement it like so:

    { TFrogFarm }
    
    constructor TFrogFarm.Create;
    var
      Frog: TFrog;
    begin
    
      // Create the list that owns the frog objects
      Frogs := TObjectList<TFrog>.Create;
    
      // Create the dictionaries that refer to the frog objects without owning them
      FrogsByID := TDictionary<Integer, TFrog>.Create;
      FrogsByName := TDictionary<string, TFrog>.Create;
    
      // Create N random frogs with unique IDs and names
      repeat
        Frog := CreateRandomFrog;
        if not FrogsByID.ContainsKey(Frog.ID) and not FrogsByName.ContainsKey(Frog.Name) then
        begin
          Frogs.Add(Frog); // transfer of ownership
          FrogsByID.Add(Frog.ID, Frog);
          FrogsByName.Add(Frog.Name, Frog);
        end
        else
          Frog.Free; // if this weren't a simple test project, we'd protect this better
      until Frogs.Count = N;
    
    end;
    
    destructor TFrogFarm.Destroy;
    begin
      FreeAndNil(FrogsByName);
      FreeAndNil(FrogsByID);
      FreeAndNil(Frogs);
      inherited;
    end;
    
    procedure TFrogFarm.TrySearchFarm;
    var
      Frog: TFrog;
      S1, S2: string;
      c1, c2, f: Int64;
    begin
    
      QueryPerformanceFrequency(f);
      QueryPerformanceCounter(c1);
    
      if FrogsByID.TryGetValue(100, Frog) then
        S1 := 'There is a frog with ID 100.'#13#10'He or she lives at ' + Frog.Address + '.'
      else
        S1 := 'There is NO frog with ID 100.';
    
      if FrogsByName.TryGetValue('Maggie A.M.D. Flower', Frog) then
        S2 := 'There is a frog named "Maggie A.M.D. Flower".'#13#10'She lives at ' + Frog.Address + '.'
      else
        S2 := 'There is NO frog named "Maggie A.M.D. Flower".';
    
      QueryPerformanceCounter(c2);
    
      ShowMessage(S1 + sLineBreak + sLineBreak + S2 + sLineBreak + sLineBreak +
        'Execution time: ' + Round(1000000*(c2 - c1)/f).ToString + ' µs');
    
    end;
    

    To try this, do

    begin
      Randomize;
      while True do
        with TFrogFarm.Create do
          try
            TrySearchFarm;
          finally
            Free;
          end;
    end;
    

    Finding an element in a dictionary is an O(1) operation, so it is fast even in very large collections. And, indeed, with one million frogs in the farm (N = 1000000), lookup takes about 2 microseconds on my system:

    Screenshot of the program: A message dialog with text "There is NO frog with ID 100. There is a frog named 'Maggie A.M.D. Flower'. She lives at Swamp 211. Execution time: 2 µs".