Edit: I used the solution provided by @peak to do the following:
$ jq -r --argjson whitelist '["role1", "role2"]' '
select(has("roles") and any(.roles[]; . == "role1" or . == "role2"))
| (reduce ."roles"[] as $r ({}; .[$r]=true)) as $roles
| [.email, .username, .given_name, .family_name, ($roles[$whitelist[]]
| . != null)]
| @csv
' users.json
Added the select()
to filter out users who haven't onboarded yet and don't have any roles, and to ensure the users included in the output have at least one of the target roles.
Scenario: user profiles as JSON docs, where each profile has a list object with their assigned roles. Example:
{
"username": "janedoe",
"roles": [
"role1",
"role4",
"role5"
]
}
The actual data file is an ndjson file, one user object as above per line.
I am only interested in specific roles, say role1
, role3
, and role4
. I want to produce a CSV formatted as:
username,role1?,role3?,role4?
e.g.,
janedoe,true,false,true
The part I haven't figured out is how to output booleans or Y / N in response to the values in the list object. Is this something I can do in jq
itself?
With your input, the invocation:
jq -r --argjson whitelist '["role1", "role3", "role4"]' '
(["username"] + $whitelist),
[.username, ($whitelist[] as $w | .roles | index([$w]) != null)]
| @csv
'
produces:
"username","role1","role3","role4"
"janedoe",true,false,true
The second last line of the jq filter above could be shortened to:
[.username, (.roles | index($whitelist[]) != null)]
Presumably if there were more than one user, you'd only want the header row once, in which case the above solution would need to be tweaked.
Because index/1
is not as efficient as it might be,
you might like to consider this alternative:
(["username"] + $whitelist),
(.roles as $roles | [.username, ($whitelist[] | IN($roles[]) )])
| @csv
If the number of roles was very large, then it would probably be more efficient to construct a JSON dictionary to avoid repeated linear lookups:
(reduce .roles[] as $r ({}; .[$r]=true)) as $roles
| (["username"] + $whitelist),
[.username, ($roles[$whitelist[]] != null)]
| @csv
For efficiency, and to ensure there's just one header, you could use inputs
with the -n command-line option. Adding the extra fields mentioned in the revised Q, you might end up with:
jq -nr --argjson whitelist '["role1", "role2"]' '
["email", "username", "given_name", "family_name"] as $greenlist
| ($greenlist + $whitelist),
(inputs
| select(has("roles") and any(.roles[] == $whitelist[]; true))
| (reduce ."roles"[] as $r ({}; .[$r]=true)) as $roles
| [ .[$greenlist[]], ($roles[$whitelist[]] != null) ])
| @csv
' users.json