Search code examples
amazon-neptuneopencypher

AWS Neptune simple opencypher count query takes too long


I have a graph database in AWS neptune which has 60M nodes. A simple count query to count all nodes takes ~6-7 minutes.

query is:

MATCH (n)
RETURN count(n) as count

Is 6 minutes to count 60 million nodes normal? what can I do to make it faster?

Explain (debug) result for this query:

Query:
// all nodes count
MATCH (n)
RETURN count(n) as cnt


ID  Out #1  Out #2  Name    Arguments   Mode    Units In    Units Out   Ratio   Time (ms)   Chunks In   Chunks Out  Units In (per chunk)    Units Out (per chunk)   Invocations OutWaitMs   Out Blocked Rate [M/s]  GCElapsedMs blocksIncGC blocksDecGC progressCount   init [ms]   done [ms]   finalize [ms]
0   1   -   SolutionInjection   solutions=[{}]  -   0   1   0.00    0   0   0   0.00    0.00    0   0   0   NaN
1   2   -   DFESubquery subQuery=subQuery1
partitionId=0
details= ====> DFE execution time toPASTModel [micros]=213 accepted [micros]=60 ready [micros]=214 running [micros]=68035522 finished [micros]=0 ===> DFE execution time (measured in DFENode) -> setupTime [ms]=0 -> executionTime [ms]=68038 -> resultReadTime [ms]=0 ====> Original AST: DFEJoinGroupNode[]( children=[ DFEProjectionNode[NONE]( projectedVars=[?cnt], child=DFEAggregationNode[NONE]( groupByVars=[], aggregateExpressions=[ DFEAggregateExpression(aggregate=DFEBindNode(countWithoutNulls(?n) AS ?cnt), isDistinct=false)], child=DFEJoinGroupNode[]( children=[ DFEPatternNode((?n, TermId(782U)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type], ?n_label1, TermId(526U)[http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph]) . project DISTINCT[?n] {rangeCountEstimate=78682901})], opInfo=none), opInfo=none), opInfo=none)], opInfo=none) ====> Preprocessed AST: DFEProjectionNode[NONE]( projectedVars=[?cnt], child=DFEAggregationNode[NONE]( groupByVars=[], aggregateExpressions=[ DFEAggregateExpression(aggregate=DFEBindNode(countWithoutNulls(?n) AS ?cnt), isDistinct=false)], child=DFEPatternNode((?n, TermId(782U)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type], ?n_label1, TermId(526U)[http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph]) . project DISTINCT[?n] {rangeCountEstimate=78682901}), opInfo=(type=NoneOperatorStub, cost=(exp=(empty),wc=(empty)))), opInfo=(type=SubQuery, cost=(exp=(empty),wc=(empty)))) ===> DFE configuration (given) solutionChunkSize=100000 outputQueueSize=20 numComputeCores=3 maxParallelIO=5 numInitialPermits=0 frontiersAsInFilters=true partitionId=0 isExplainRequested=true languageSpecifier=Open_Cypher planVariant=all/BLOCKING readForUpdate=false ====> DFE configuration (reported) numComputeCores=3 numIOThreads=1 numInitialPermits=1 permitsSecured=1732 ===> Top level Statistics & operator histogram ==> Statistics -> 68032185 / 68026188 micros total elapsed (incl. wait / excl. wait) -> 68032 / 68026 millis total elapsed (incl. wait / excl. wait) -> 68 / 68 secs total elapsed (incl. wait / excl. wait) ==> GC Summary -> 40.68ms spent in GC (0.06% of total time) ==> Operator histogram (all times are excluding wait) -> Total Operator #instances: 4 Operator │ Time(ms) │ Time(%) │ rowsIn │ rowsOut │ chunksIn │ chunksOut │ instances │ invocation │ in(M/s) │ out(M/s) │ time/chunkIn(ms) │ time/chunkOut(ms) │ time/invoc(ms) │ GC(ms) │ GC(%) ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── pipelineScan │ 67518 │ 99.25% │ 0 │ 60341570 │ 0 │ 1751 │ 1 │ 1 │ 0.00 │ 0.89 │ - │ 38.56 │ 67518 │ 17.67 │ 0.03 merge │ 415 │ 0.61% │ 60341570 │ 60341570 │ 1751 │ 1 │ 1 │ 1 │ 145.53 │ 145.53 │ 0.24 │ 415 │ 415 │ 6.05 │ 1.46 daslReduce │ 93.86 │ 0.14% │ 60341570 │ 1 │ 1 │ 1 │ 1 │ 1 │ 642.91 │ 0.00 │ 93.86 │ 93.86 │ 93.86 │ 16.94 │ 18.05 drain │ 0.06 │ 0.00% │ 1 │ 0 │ 1 │ 0 │ 1 │ 1 │ 0.02 │ 0.00 │ 0.06 │ - │ 0.06 │ 0.02 │ 25.00 -   0   1   0.00    68048   0   1   0.00    1.00    2   0   0   0.00
2   -   -   TermResolution  vars=[?cnt] id2value_opencypher 1   1   1.00    1.00    1   1   1.00    1.00    1   0   0   0.00
subQuery1
ID  Out #1  Out #2  Name    Arguments   Mode    Units In    Units Out   Ratio   Time (ms)   Chunks In   Chunks Out  Units In (per chunk)    Units Out (per chunk)   Invocations OutWaitMs   Out Blocked Rate [M/s]  GCElapsedMs blocksIncGC blocksDecGC progressCount   init [ms]   done [ms]   finalize [ms]
0   1   -   DFEPipelineScan pattern=distinct ?n (?n,rdf:type,?n_label1,neptune:DefaultNamedGraph)
outSchema=[?n]
patternEstimate=78682901    -   0   60341570    0.00    67518   0   1751    0.00    34461.21    1   5.97    0   0.89
1   2   -   DFEMergeChunks  inSchema=[?n]
outSchema=[?n]  -   60341570    60341570    1.00    415 1751    1   34461.21    60341570.00 1   0.01    0   145.53
2   3   -   DFEReduce   functor=countWithoutNulls(?n)
inSchema=[?n]
outSchema=[?cnt]    -   60341570    1   0.00    93.86   1   1   60341570.00 1.00    1   0.01    0   0.00
3   -   -   DFEDrain    inSchema=[?cnt]
outSchema=[?cnt]    -   1   0   0.00    0.06    1   0   1.00    0.00    1   0   0   0.00

Explain (detail) results: enter image description here

RUN THIS SNIPPET BELOW FOR BETTER READABILITY

<!DOCTYPE html>
<html>
<body>




<h3>Query:</h3>
<p>// all nodes count<br/>MATCH (n)<br/>RETURN count(n) as cnt<br/></p>
<br/>


<table border="1px">
    <thead>
        <tr>
                        <th>ID</th>
                        <th>Out #1</th>
                        <th>Out #2</th>
                        <th>Name</th>
                        <th>Arguments</th>
                        <th>Mode</th>
                        <th>Units In</th>
                        <th>Units Out</th>
                        <th>Ratio</th>
                        <th>Time (ms)</th>
                        <th>Chunks In</th>
                        <th>Chunks Out</th>
                        <th>Units In (per chunk)</th>
                        <th>Units Out (per chunk)</th>
                        <th>Invocations</th>
                        <th>OutWaitMs</th>
                        <th>Out Blocked</th>
                        <th>Rate [M/s]</th>
                        <th>GCElapsedMs</th>
                        <th>blocksIncGC</th>
                        <th>blocksDecGC</th>
                        <th>progressCount</th>
                        <th>init [ms]</th>
                        <th> done [ms]</th>
                        <th>finalize [ms]</th>
                    </tr>
    </thead>

    <tbody>
                <tr>
            <td>0</td>
            <td>1</td>
            <td>-</td>
            <td>SolutionInjection</td>
            <td>solutions=[{}]</td>
            <td>-</td>

                        <td>0</td>
            <td>1</td>
            <td>0.00</td>
            <td>0</td>
                        <td>0</td>
            <td>0</td>
            <td>0.00</td>
            <td>0.00</td>
            <td>0</td>
            <td>0</td>
            <td>0</td>
            <td>NaN</td>
                                </tr>
            <tr>
            <td>1</td>
            <td>2</td>
            <td>-</td>
            <td>DFESubquery</td>
            <td>subQuery=subQuery1<br>partitionId=0<br>details=

====&gt; DFE execution time
toPASTModel [micros]=213
accepted [micros]=60
ready [micros]=214
running [micros]=68035522
finished [micros]=0


===&gt; DFE execution time (measured in DFENode)
-&gt; setupTime [ms]=0
-&gt; executionTime [ms]=68038
-&gt; resultReadTime [ms]=0




====&gt; Original AST:
DFEJoinGroupNode[](
  children=[
    DFEProjectionNode[NONE](
      projectedVars=[?cnt],
      child=DFEAggregationNode[NONE](
        groupByVars=[],
        aggregateExpressions=[
          DFEAggregateExpression(aggregate=DFEBindNode(countWithoutNulls(?n) AS ?cnt), isDistinct=false)],
        child=DFEJoinGroupNode[](
          children=[
            DFEPatternNode((?n, TermId(782U)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type], ?n_label1, TermId(526U)[http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph]) . project DISTINCT[?n] {rangeCountEstimate=78682901})],
          opInfo=none),
        opInfo=none),
      opInfo=none)],
  opInfo=none)

====&gt; Preprocessed AST:
DFEProjectionNode[NONE](
  projectedVars=[?cnt],
  child=DFEAggregationNode[NONE](
    groupByVars=[],
    aggregateExpressions=[
      DFEAggregateExpression(aggregate=DFEBindNode(countWithoutNulls(?n) AS ?cnt), isDistinct=false)],
    child=DFEPatternNode((?n, TermId(782U)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type], ?n_label1, TermId(526U)[http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph]) . project DISTINCT[?n] {rangeCountEstimate=78682901}),
    opInfo=(type=NoneOperatorStub, cost=(exp=(empty),wc=(empty)))),
  opInfo=(type=SubQuery, cost=(exp=(empty),wc=(empty))))

===&gt; DFE configuration (given)
solutionChunkSize=100000
outputQueueSize=20
numComputeCores=3
maxParallelIO=5
numInitialPermits=0
frontiersAsInFilters=true
partitionId=0
isExplainRequested=true
languageSpecifier=Open_Cypher
planVariant=all/BLOCKING
readForUpdate=false


====&gt; DFE configuration (reported)
numComputeCores=3
numIOThreads=1
numInitialPermits=1
permitsSecured=1732


===&gt; Top level Statistics &amp; operator histogram
==&gt; Statistics
-&gt; 68032185 / 68026188 micros total elapsed (incl. wait / excl. wait)
-&gt; 68032 / 68026 millis total elapsed (incl. wait / excl. wait)
-&gt; 68 / 68 secs total elapsed (incl. wait / excl. wait)

==&gt; GC Summary
-&gt; 40.68ms spent in GC (0.06% of total time)

==&gt; Operator histogram (all times are excluding wait)
-&gt; Total Operator #instances: 4

            Operator  │   Time(ms)  │  Time(%)  │     rowsIn  │    rowsOut  │  chunksIn  │ chunksOut  │  instances  │ invocation  │    in(M/s)  │   out(M/s)  │ time/chunkIn(ms)  │ time/chunkOut(ms)  │    time/invoc(ms)  │     GC(ms)  │  GC(%)
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
        pipelineScan  │      67518  │   99.25%  │          0  │   60341570  │         0  │      1751  │          1  │          1  │       0.00  │       0.89  │                -  │             38.56  │             67518  │      17.67  │   0.03
               merge  │        415  │    0.61%  │   60341570  │   60341570  │      1751  │         1  │          1  │          1  │     145.53  │     145.53  │             0.24  │               415  │               415  │       6.05  │   1.46
          daslReduce  │      93.86  │    0.14%  │   60341570  │          1  │         1  │         1  │          1  │          1  │     642.91  │       0.00  │            93.86  │             93.86  │             93.86  │      16.94  │  18.05
               drain  │       0.06  │    0.00%  │          1  │          0  │         1  │         0  │          1  │          1  │       0.02  │       0.00  │             0.06  │                 -  │              0.06  │       0.02  │  25.00</td>
            <td>-</td>

                        <td>0</td>
            <td>1</td>
            <td>0.00</td>
            <td>68048</td>
                        <td>0</td>
            <td>1</td>
            <td>0.00</td>
            <td>1.00</td>
            <td>2</td>
            <td>0</td>
            <td>0</td>
            <td>0.00</td>
                                </tr>
            <tr>
            <td>2</td>
            <td>-</td>
            <td>-</td>
            <td>TermResolution</td>
            <td>vars=[?cnt]</td>
            <td>id2value_opencypher</td>

                        <td>1</td>
            <td>1</td>
            <td>1.00</td>
            <td>1.00</td>
                        <td>1</td>
            <td>1</td>
            <td>1.00</td>
            <td>1.00</td>
            <td>1</td>
            <td>0</td>
            <td>0</td>
            <td>0.00</td>
                                </tr>
        </tbody>
</table>

<h3>subQuery1</h3>


<table border="1px">
    <thead>
        <tr>
                        <th>ID</th>
                        <th>Out #1</th>
                        <th>Out #2</th>
                        <th>Name</th>
                        <th>Arguments</th>
                        <th>Mode</th>
                        <th>Units In</th>
                        <th>Units Out</th>
                        <th>Ratio</th>
                        <th>Time (ms)</th>
                        <th>Chunks In</th>
                        <th>Chunks Out</th>
                        <th>Units In (per chunk)</th>
                        <th>Units Out (per chunk)</th>
                        <th>Invocations</th>
                        <th>OutWaitMs</th>
                        <th>Out Blocked</th>
                        <th>Rate [M/s]</th>
                        <th>GCElapsedMs</th>
                        <th>blocksIncGC</th>
                        <th>blocksDecGC</th>
                        <th>progressCount</th>
                        <th>init [ms]</th>
                        <th> done [ms]</th>
                        <th>finalize [ms]</th>
                    </tr>
    </thead>

    <tbody>
                <tr>
            <td>0</td>
            <td>1</td>
            <td>-</td>
            <td>DFEPipelineScan</td>
            <td>pattern=distinct ?n (?n,rdf:type,?n_label1,neptune:DefaultNamedGraph)<br>outSchema=[?n]<br>patternEstimate=78682901</td>
            <td>-</td>

                        <td>0</td>
            <td>60341570</td>
            <td>0.00</td>
            <td>67518</td>
                        <td>0</td>
            <td>1751</td>
            <td>0.00</td>
            <td>34461.21</td>
            <td>1</td>
            <td>5.97</td>
            <td>0</td>
            <td>0.89</td>
                                </tr>
            <tr>
            <td>1</td>
            <td>2</td>
            <td>-</td>
            <td>DFEMergeChunks</td>
            <td>inSchema=[?n]<br>outSchema=[?n]</td>
            <td>-</td>

                        <td>60341570</td>
            <td>60341570</td>
            <td>1.00</td>
            <td>415</td>
                        <td>1751</td>
            <td>1</td>
            <td>34461.21</td>
            <td>60341570.00</td>
            <td>1</td>
            <td>0.01</td>
            <td>0</td>
            <td>145.53</td>
                                </tr>
            <tr>
            <td>2</td>
            <td>3</td>
            <td>-</td>
            <td>DFEReduce</td>
            <td>functor=countWithoutNulls(?n)<br>inSchema=[?n]<br>outSchema=[?cnt]</td>
            <td>-</td>

                        <td>60341570</td>
            <td>1</td>
            <td>0.00</td>
            <td>93.86</td>
                        <td>1</td>
            <td>1</td>
            <td>60341570.00</td>
            <td>1.00</td>
            <td>1</td>
            <td>0.01</td>
            <td>0</td>
            <td>0.00</td>
                                </tr>
            <tr>
            <td>3</td>
            <td>-</td>
            <td>-</td>
            <td>DFEDrain</td>
            <td>inSchema=[?cnt]<br>outSchema=[?cnt]</td>
            <td>-</td>

                        <td>1</td>
            <td>0</td>
            <td>0.00</td>
            <td>0.06</td>
                        <td>1</td>
            <td>0</td>
            <td>1.00</td>
            <td>0.00</td>
            <td>1</td>
            <td>0</td>
            <td>0</td>
            <td>0.00</td>
                                </tr>
        </tbody>
</table>
</body>
</html>


Solution

  • Starting with Neptune version 1.2.1.0, we now have a Summary API can can provide both vertex and edge counts. It will also provide he total list of distinct vertex/edge labels and the total counts by property label.

    https://aws.amazon.com/about-aws/whats-new/2023/03/amazon-neptune-graph-summary-api/