Search code examples
miller

How to split a large CSV file into multiple JSON files using the Miller command line tool?


I am currently using this Miller command to convert a CSV file into a JSON array file:

mlr --icsv --ojson --jlistwrap cat sample.csv > sample.json

It works fine, but the JSON array is too large.

Can Miller split the output into many smaller JSON files of X rows each?

For example if the original CSV has 100 rows, can I modify the command to output 10 JSON Array files, with each JSON array holding 10 converted CSV rows?

Bonus points if each JSON Array can also be wrapped like this:

{
  "instances": 

//JSON ARRAY GOES HERE

}

Solution

  • you could run this

    mlr --c2j --jlistwrap put -q '
      begin {
        @batch_size = 1000;
      }
      index = int(floor((NR-1) / @batch_size));
      label = fmtnum(index,"%04d");
      filename = "part-".label.".json";
      tee > filename, $*
    ' ./input.csv
    

    You will have a file named part-00xx every 1000 record.