Search code examples
jsoncsvmiller

How can I clean up empty fields when converting CSV to JSON with Miller?


I have several CSV files of item data for a game I'm messing around with that I need to convert to JSON for consumption. The data can be quite irregular with several empty fields per record, which makes for sort of ugly JSON output.

Example with dummy values:

Id,Name,Value,Type,Properties/1,Properties/2,Properties/3,Properties/4
01:Foo:13,Foo,13,ACME,CanExplode,IsRocket,,
02:Bar:42,Bar,42,,IsRocket,,,
03:Baz:37,Baz,37,BlackMesa,CanExplode,IsAlive,IsHungry,

Converted output:

[
{
  "Id": "01:Foo:13",
  "Name": "Foo",
  "Value": 13,
  "Type": "ACME",
  "Properties": ["CanExplode", "IsRocket", ""]
},
{
  "Id": "02:Bar:42",
  "Name": "Bar",
  "Value": 42,
  "Type": "",
  "Properties": ["IsRocket", "", ""]
},
{
  "Id": "03:Baz:37",
  "Name": "Baz",
  "Value": 37,
  "Type": "BlackMesa",
  "Properties": ["CanExplode", "IsAlive", "IsHungry"]
}
]

So far I've been quite successful with using Miller. I've managed to remove completely empty columns from the CSV as well as aggregate the Properties/X columns into a single array.

But now I'd like to do two more things to improve the output format to make consuming the JSON easier:

  • remove empty strings "" from the Properties array
  • replace the other empty strings "" (e.g. Type of the second record) with null

Desired output:

[
{
  "Id": "01:Foo:13",
  "Name": "Foo",
  "Value": 13,
  "Type": "ACME",
  "Properties": ["CanExplode", "IsRocket"]
},
{
  "Id": "02:Bar:42",
  "Name": "Bar",
  "Value": 42,
  "Type": null,
  "Properties": ["IsRocket"]
},
{
  "Id": "03:Baz:37",
  "Name": "Baz",
  "Value": 37,
  "Type": "BlackMesa",
  "Properties": ["CanExplode", "IsAlive", "IsHungry"]
}
]

Is there a way to achieve that with Miller?

My current commands are:

  • mlr -I --csv remove-empty-columns file.csv to clean up the columns
  • mlr --icsv --ojson --jflatsep '/' --jlistwrap cat file.csv > file.json for the conversion

Solution

  • It's not probably the way you want to do it. I use also jq.

    Running

    mlr --c2j  --jflatsep '/' --jlistwrap remove-empty-columns then cat input.csv | \
    jq '.[].Properties|=map(select(length > 0))' | \
    jq '.[].Type|=(if . == "" then null else . end)'
    

    you will have

    [
      {
        "Id": "01:Foo:13",
        "Name": "Foo",
        "Value": 13,
        "Type": "ACME",
        "Properties": [
          "CanExplode",
          "IsRocket"
        ]
      },
      {
        "Id": "02:Bar:42",
        "Name": "Bar",
        "Value": 42,
        "Type": null,
        "Properties": [
          "IsRocket"
        ]
      },
      {
        "Id": "03:Baz:37",
        "Name": "Baz",
        "Value": 37,
        "Type": "BlackMesa",
        "Properties": [
          "CanExplode",
          "IsAlive",
          "IsHungry"
        ]
      }
    ]