Search code examples
jsoncsvbooleanjqindicator

Can I output boolean based on values in a list?


Edit: I used the solution provided by @peak to do the following:

$ jq -r --argjson whitelist '["role1", "role2"]' '
select(has("roles") and any(.roles[]; . == "role1" or . == "role2"))
| (reduce ."roles"[] as $r ({}; .[$r]=true)) as $roles
| [.email, .username, .given_name, .family_name, ($roles[$whitelist[]]
| . != null)]
| @csv
' users.json

Added the select() to filter out users who haven't onboarded yet and don't have any roles, and to ensure the users included in the output have at least one of the target roles.

Scenario: user profiles as JSON docs, where each profile has a list object with their assigned roles. Example:

{
  "username": "janedoe",
  "roles": [
    "role1",
    "role4",
    "role5"
  ]
}

The actual data file is an ndjson file, one user object as above per line.

I am only interested in specific roles, say role1, role3, and role4. I want to produce a CSV formatted as:

username,role1?,role3?,role4?

e.g.,

janedoe,true,false,true

The part I haven't figured out is how to output booleans or Y / N in response to the values in the list object. Is this something I can do in jq itself?


Solution

  • With your input, the invocation:

    jq -r --argjson whitelist '["role1", "role3", "role4"]' '
      (["username"] + $whitelist),
      [.username, ($whitelist[] as $w | .roles | index([$w]) != null)]
      | @csv
    '
    

    produces:

    "username","role1","role3","role4"
    "janedoe",true,false,true
    
    

    Notes:

    1. The second last line of the jq filter above could be shortened to:

      [.username, (.roles | index($whitelist[]) != null)]

    2. Presumably if there were more than one user, you'd only want the header row once, in which case the above solution would need to be tweaked.

    Using IN/1

    Because index/1 is not as efficient as it might be, you might like to consider this alternative:

      (["username"] + $whitelist),
      (.roles as $roles | [.username, ($whitelist[] | IN($roles[]) )])
      | @csv
    

    Using a JSON dictionary

    If the number of roles was very large, then it would probably be more efficient to construct a JSON dictionary to avoid repeated linear lookups:

      (reduce .roles[] as $r ({}; .[$r]=true)) as $roles
      | (["username"] + $whitelist),
        [.username, ($roles[$whitelist[]] != null)]
      | @csv
    

    With ndjson as input

    For efficiency, and to ensure there's just one header, you could use inputs with the -n command-line option. Adding the extra fields mentioned in the revised Q, you might end up with:

    jq -nr --argjson whitelist '["role1", "role2"]' '
      ["email", "username", "given_name", "family_name"] as $greenlist
      | ($greenlist + $whitelist),
        (inputs 
         | select(has("roles") and any(.roles[] == $whitelist[]; true))
         | (reduce ."roles"[] as $r ({}; .[$r]=true)) as $roles
         | [ .[$greenlist[]], ($roles[$whitelist[]] != null) ])
      | @csv
    ' users.json