I am trying to get some value from json via jq in bash. With small value it work nice but with big json it work too slow, like 1 value for each 2-3 second. Example of my code:
json=$(curl -s -A "some useragent" "url" )
pid=$(cat idlist.json | jq '.page_ids[]')
for id in $pid
do
echo $pagejson|jq -r '.page[]|select(.id=='$id')|.url'>>path.url
done
The "pid" is list of id that I type before running script. It may contain 700-1000 id. Example object of json
{
"page":[
{
"url":"some url",
"id":some numbers
},
{
"url":"some url",
"id":some numbers
}
]
}
Is there any way to speed up it? In javascript it work faster than it. Example of javascript:
//First sort object with order
var url="";
var sortedjson= ids.map(id => obj.find(page => page.id === id));
//Then collect url
for ( x=0 ; x < sortedjson.length;x++) {
url+=sortedjson[x].url
};
Should I sort json like in javascript for better performance? I don't tried it because don't know how.
Edit:
Replaced "pid" variable with json to use less code and for id in $(echo $pid)
with for id in $pid
.
But it still slow down if id list more than about 50
Calling jq
once per id is always going to be slow. Don't do that -- call jq just once, and have it match against the full set.
You can accomplish that by passing the entire comma-separated list of ids into your one copy of jq, and letting jq itself do the work of splitting that string into individual items (and then putting them in a dictionary for fast access)
For example:
pid="24885,73648,38758,8377,747"
jq --arg pidListStr "$pid" '
($pidListStr | [split(",")[] | {(.): true}] | add) as $pidDict |
.page[] | select($pidDict[.id | tostring]) | .url
' <<<"$pagejson"