Consider the following CSV:
email/1,email/2
abc@xyz.org,bob@pass.com
You can easily convert it to JSON (taking into account the paths defined by the keys) with Miller:
mlr --icsv --ojson --jflatsep '/' cat file.csv
[ { "email": ["abc@xyz.org", "bob@pass.com"] } ]
Now, if the paths are 0-indexed in the CSV (which is surely more common):
email/0,email/1
abc@xyz.org,bob@pass.com
Then, without prior knowledge of the fields names, it seams that you'll have to rewrite the whole conversion:
edit: replaced the hard-coded /
with FLATSEP
builtin variable:
mlr --icsv --flatsep '/' put -q '
begin { @labels = []; print "[" }
# translate the original CSV header from 0-indexed to 1-indexed
NR == 1 {
i = 1;
for (k in $*) {
@labels[i] = joinv( apply( splita(k,FLATSEP), func(e) {
return typeof(e) == "int" ? e+1 : e
}), FLATSEP );
i += 1;
}
}
NR > 1 { print @object, "," }
# create an object from the translated labels and the row values
o = {};
i = 1;
for (k,v in $*) {
o[@labels[i]] = v;
i += 1;
}
@object = arrayify( unflatten(o,FLATSEP) );
end { if (NR > 0) { print @object } print "]" }
' file.csv
I would like to know if I'm missing something obvious, like a command line option or a way to rename the fields with the put
verb, or maybe something else? You're also welcome to give your insights about the previous code, as I'm not really confident in my Miller's programming skills.
Update:
With @aborruso approach of pre-processing the CSV header, this could be reduced to:
note: I didn't keep the regextract
part because it means knowing the CSV header in advance.
mlr --csv -N --flatsep '/' put '
NR == 1 {
for (i,k in $*) {
$[i] = joinv( apply( splita(k,FLATSEP), func(e) {
return typeof(e) == "int" ? e+1 : e
}), FLATSEP );
}
}
' file.csv |
mlr --icsv --flatsep '/' --ojson cat
Even if there are workarounds like using the rename
verb (when you know the header in advance) or pre-processing the CSV header, I still hope that Miller's author could add an extra command-line option that would deal with this kind of 0‑indexed external data; adding a DSL
function like arrayify0
(and flatten0
) could also prove useful in some cases.
I would like to know if I'm missing something obvious, like a command line option or a way to rename the fields with put verb, or maybe something else?
Starting from this
email/0,email/1
abc@xyz.org,bob@pass.com
you can use implicit CSV header and run
mlr --csv -N put 'if (NR == 1) {for (k in $*) {$[k] = "email/".string(int(regextract($[k],"[0-9]+"))+1)}}' input.csv
to have
email/1,email/2
abc@xyz.org,bob@pass.com