Insert result of a Query using aggregation subquery as filter into new collection

How can I implement this query in MongoDB

select * into NewTable 
from SalesInvoice where SalesID in 
   (select SalesID from SalesInvoice 
        group by CustomerID 
          having count(CustomerID)>1
    )

I could build the aggregate query, but couldn't find a way to make a new query and use the aggregate query as a filter and then insert the result into new collection.

Below is aggregate query:

db.SalesInvoice.aggregate([ {$group: {_id:"$request.CustomerID", count: {$sum:1}}},{$match: {count: {$gt:1}}}])

Solution

I tried few things but those didn't work. Tried gpt also a few times, didn't work either. Finally today I tried gpt and it gave an exact solution.

To identify duplicate documents based on a specific field and then move all duplicates except one original record to a new collection, you can follow these steps in MongoDB:

Identify duplicates using the $group stage.
Filter out the unique documents using the $match stage.
Move duplicates to a new collection using the $out stage or by inserting them manually.

Step-by-Step Example

Let's assume you have a collection named originalCollection and you want to identify duplicates based on a field named fieldToCheck.

1. Identify Duplicates and Keep One Original Record

First, identify duplicates and mark them for removal:

var duplicates = db.originalCollection.aggregate([
  {
    $group: {
      _id: "$fieldToCheck",
      uniqueIds: { $addToSet: "$_id" },
      count: { $sum: 1 }
    }
  },
  {
    $match: {
      count: { $gt: 1 }
    }
  }
]).toArray();

var idsToRemove = [];

duplicates.forEach(function(doc) {
  // Keep the first element and mark the rest for removal
  doc.uniqueIds.shift();
  idsToRemove = idsToRemove.concat(doc.uniqueIds);
});

2. Move Duplicates to a New Collection

Now, move the identified duplicate documents to a new collection:

var duplicateDocs = db.originalCollection.find({ _id: { $in: idsToRemove } }).toArray();

db.duplicateCollection.insertMany(duplicateDocs);

3. Remove Duplicates from the Original Collection

Finally, remove the duplicate documents from the original collection:

db.originalCollection.remove({ _id: { $in: idsToRemove } });

Combined Script

Here’s a combined script to do everything at once:

// Step 1: Identify duplicates
var duplicates = db.originalCollection.aggregate([
  {
    $group: {
      _id: "$fieldToCheck",
      uniqueIds: { $addToSet: "$_id" },
      count: { $sum: 1 }
    }
  },
  {
    $match: {
      count: { $gt: 1 }
    }
  }
]).toArray();

var idsToRemove = [];

// Step 2: Mark duplicates for removal, keep one original
duplicates.forEach(function(doc) {
  doc.uniqueIds.shift(); // Remove the first element from the array to keep it
  idsToRemove = idsToRemove.concat(doc.uniqueIds);
});

// Step 3: Move duplicates to a new collection
var duplicateDocs = db.originalCollection.find({ _id: { $in: idsToRemove } }).toArray();
db.duplicateCollection.insertMany(duplicateDocs);

// Step 4: Remove duplicates from the original collection
db.originalCollection.remove({ _id: { $in: idsToRemove } });

This script first identifies the duplicate documents based on fieldToCheck, then moves the duplicates to duplicateCollection, and finally removes them from originalCollection while keeping one original document for each duplicate group.