Search code examples
arangodbaql

How to find duplicates documents?


It's very strange that I did not find answer in documentation and here for a very simple question. How to find duplicated records in collections. For example I need to find duplicated by id for next documents:

{"id": 1, name: "Mike"},
{"id": 2, name: "Jow"},
{"id": 3, name: "Piter"},
{"id": 1, name: "Robert"}

I need to query that will return two documents with same id (id: 1 in my case).


Solution

  • Have a look at the COLLECT AQL command, it can return the count of documents that contain duplicate values, such as your id key.

    ArangoDB AQL - COLLECT

    You can use LET a lot in AQL to help break down a query into smaller steps, and work with the output in future queries.

    It may be possible to also collapse it all into one query, but this technique helps break it down.

    LET duplicates = (
        FOR d IN myCollection
        COLLECT id = d.id WITH COUNT INTO count
        FILTER count > 1
        RETURN {
            id: id,
            count: count
        }
    )
    
    FOR d IN duplicates
    FOR m IN myCollection
    FILTER d.id == m.id
    RETURN m
    

    This will return:

    [
      {
        "_key": "416140",
        "_id": "myCollection/416140",
        "_rev": "_au4sAfS--_",
        "id": 1,
        "name": "Mike"
      },
      {
        "_key": "416176",
        "_id": "myCollection/416176",
        "_rev": "_au4sici--_",
        "id": 1,
        "name": "Robert"
      }
    ]