Search code examples
c#sqlnhibernatepostgresqlcastle-activerecord

Recommended approach to insert many rows with Castle ActiveRecord and ignore any dupes


I have a webmethod that inserts a bunch of recipes into a queue in the database (to store recipes the user is interested in cooking, similar to NetFlix's movie queue). The user is able to check off a bunch of recipes at once and queue them. I have code similar to this:

[WebMethod]
public void EnqueueRecipes(SecurityCredentials credentials, Guid[] recipeIds)
{
    DB.User user = new DB.User(credentials);

    using (new TransactionScope(OnDispose.Commit))
    {
       foreach (Guid rid in recipeIds)
       {
          DB.QueuedRecipe qr = new DB.QueuedRecipe(Guid.NewGuid(), user, new DB.Recipe(rid));
          qr.Create();
       }
    }
}

I have a unique constraint on UserId/RecipeId so a user can only enqueue a recipe once. However, if they happen to select a recipe that's already in their queue I don't really want to bother the user with an error message, I just want to ignore that recipe.

The above code will throw a SQL exception if the unique constraint is violated. What's the best approach to get around this, and simply ignore duplicate rows. My current ideas are:

  • 1) First load the user's entire queue from the database and check that list first. If the recipe already exists, just continue in the for loop. Pros: No unnecessary SQL inserts get sent to the database. Cons: Slower, especially if the user has a big queue.
  • 2) Don't use ActiveRecord and instead pass the entire recipeIds array into a SQL function. This function will check if each row exists first. Pros: Potentially fast, lets SQL handle all the dirty work. Cons: Breaks ActiveRecord pattern and requires new DB code, which is often harder to maintain and costlier to implement.
  • 3) CreateAndFlush after each loop. Basically, don't run this entire loop in a single transaction. Commit each row as it's added and catch SQL errors and ignore. Pros: Low startup cost, and doesn't require new SQL backend code. Cons: Potentially slower for inserting lots of rows into the database at once, though it's doubtful a user would ever submit over a dozen or so new recipes at once.

Are there any other little tricks with Castle or the NHibernate framework? Also, my SQL backend is PostgreSQL 9.0. Thanks!

Update:

I took a shot at the first approach and it seems to work pretty well. It occured to me I don't have to load the entire queue, just the ones that appear in recipeIds. I believe my foreach() loop is now O(n^2) depending on the efficiency of List<Guid>::Contains() but I think this is probably decent for the sizes I'll be working with.

//Check for dupes
DB.QueuedRecipe[] dbRecipes = DB.QueuedRecipe.FindAll(Expression.In("Recipe",
   (from r in recipeIds select new DB.Recipe(r)).ToArray()
));

List<Guid> existing = (from r in dbRecipes select r.Recipe.RecipeId).ToList();

using (new TransactionScope(OnDispose.Commit))
{
   foreach (Guid rid in recipeIds)
   {
      if (existing.Contains(rid))
         continue;

      DB.QueuedRecipe qr = new DB.QueuedRecipe(Guid.NewGuid(), user, new DB.Recipe(rid));
      qr.Create();
   }
}

Solution

  • You can do that with a single SQL statement:

    INSERT INTO user_recipe
    SELECT new_UserId, new_RecipeId
    FROM   user_recipe
    WHERE  NOT EXISTS (
       SELECT *
       FROM   user_recipe
       WHERE  (UserId, RecipeId) = (new_UserId, new_RecipeId)
       );
    

    The SELECT only returns the row if it doesn't already exist, so it will only be inserted in this case.


    Solution for bulk inserts

    If you have a long list of recipes to insert at once, you could:

    CREATE TEMP TABLE i(userId int, recipeid int) ON COMMIT DROP;
    
    INSERT INTO i VALUES
    (1,2), (2,4), (2,4), (2,7), (2,43), (23,113), (223,133);
    
    INSERT INTO user_recipe
    SELECT DISTINCT i.*  -- remove dupes from the insert candidates themselves
    FROM   i
    LEFT   JOIN user_recipe u USING (userid, recipeid)
    WHERE  u.userid IS NULL;
    

    Solution for inserting a handful at a time

    Temporary table would be an overkill for just a few records, as Mike commented.

    INSERT INTO user_recipe
    SELECT i.* 
    FROM  (
        SELECT DISTINCT *     -- only if you need to remove possible dupes
        FROM (
           VALUES (1::int, 2::int)
              ,(2, 3)
              ,(2, 4)
              ,(2, 4)            -- dupe will be removed
              ,(2, 43)
              ,(23, 113)
              ,(223, 133)
           ) i(userid, recipeid)
        ) i
    LEFT   JOIN user_recipe u USING (userid, recipeid)
    WHERE  u.userid IS NULL;