Search code examples
jsonstreamjqskip

How to skip first n objects in jq input


I have a VERY large stream of objects, which I am trying to import into MongoDB. I keep getting a broken pipe after about 10k objects, so I would like to be able to update my import script to skip the already imported objects and begin with the first one that was missed.

It seems to me that the tool for this would be jq. What I need is a way to skip (yield empty) all items before the nth, and then output the rest as-is.

I've tried using foreach to maintain an object counter, but I keep ending up with 1 as the value of the counter, for all objects in my small test sample (using a bash here document):

$ jq 'foreach . as $item (0; (.+1); [ . , if . < 2 then empty else $item end ])' <<"end"
> { "item": "first" }
> { "item": "second" }
> { "item": "third" }
> { "item": "fourth" }
> end

The output from this is:

[
  1
]
[
  1
]
[
  1
]
[
  1
]

Any suggestions would be most welcome.


Solution

  • def skip(n; stream):
      foreach stream as $s (0; .+1; select(. > n) | $s);
    

    Example:

    skip(1000; inputs)
    

    (When using inputs and/or input, don't forget you'll probably want to use the -n command-line option.)

    Sledgehammer Approach

    try (range(0; 1000) | input | empty), inputs
    

    In this case, the try is necessary to avoid an error should there be fewer than the requested number of items.