Search code examples
mongodbmongodb-query

Insert result of a Query using aggregation subquery as filter into new collection


How can I implement this query in MongoDB

select * into NewTable 
from SalesInvoice where SalesID in 
   (select SalesID from SalesInvoice 
        group by CustomerID 
          having count(CustomerID)>1
    )

I could build the aggregate query, but couldn't find a way to make a new query and use the aggregate query as a filter and then insert the result into new collection.

Below is aggregate query:

db.SalesInvoice.aggregate([ {$group: {_id:"$request.CustomerID", count: {$sum:1}}},{$match: {count: {$gt:1}}}])

Solution

  • I tried few things but those didn't work. Tried gpt also a few times, didn't work either. Finally today I tried gpt and it gave an exact solution.

    To identify duplicate documents based on a specific field and then move all duplicates except one original record to a new collection, you can follow these steps in MongoDB:

    1. Identify duplicates using the $group stage.
    2. Filter out the unique documents using the $match stage.
    3. Move duplicates to a new collection using the $out stage or by inserting them manually.

    Step-by-Step Example

    Let's assume you have a collection named originalCollection and you want to identify duplicates based on a field named fieldToCheck.

    1. Identify Duplicates and Keep One Original Record

    First, identify duplicates and mark them for removal:

    var duplicates = db.originalCollection.aggregate([
      {
        $group: {
          _id: "$fieldToCheck",
          uniqueIds: { $addToSet: "$_id" },
          count: { $sum: 1 }
        }
      },
      {
        $match: {
          count: { $gt: 1 }
        }
      }
    ]).toArray();
    
    var idsToRemove = [];
    
    duplicates.forEach(function(doc) {
      // Keep the first element and mark the rest for removal
      doc.uniqueIds.shift();
      idsToRemove = idsToRemove.concat(doc.uniqueIds);
    });
    

    2. Move Duplicates to a New Collection

    Now, move the identified duplicate documents to a new collection:

    var duplicateDocs = db.originalCollection.find({ _id: { $in: idsToRemove } }).toArray();
    
    db.duplicateCollection.insertMany(duplicateDocs);
    

    3. Remove Duplicates from the Original Collection

    Finally, remove the duplicate documents from the original collection:

    db.originalCollection.remove({ _id: { $in: idsToRemove } });
    

    Combined Script

    Here’s a combined script to do everything at once:

    // Step 1: Identify duplicates
    var duplicates = db.originalCollection.aggregate([
      {
        $group: {
          _id: "$fieldToCheck",
          uniqueIds: { $addToSet: "$_id" },
          count: { $sum: 1 }
        }
      },
      {
        $match: {
          count: { $gt: 1 }
        }
      }
    ]).toArray();
    
    var idsToRemove = [];
    
    // Step 2: Mark duplicates for removal, keep one original
    duplicates.forEach(function(doc) {
      doc.uniqueIds.shift(); // Remove the first element from the array to keep it
      idsToRemove = idsToRemove.concat(doc.uniqueIds);
    });
    
    // Step 3: Move duplicates to a new collection
    var duplicateDocs = db.originalCollection.find({ _id: { $in: idsToRemove } }).toArray();
    db.duplicateCollection.insertMany(duplicateDocs);
    
    // Step 4: Remove duplicates from the original collection
    db.originalCollection.remove({ _id: { $in: idsToRemove } });
    

    This script first identifies the duplicate documents based on fieldToCheck, then moves the duplicates to duplicateCollection, and finally removes them from originalCollection while keeping one original document for each duplicate group.