I have a zip file(GZ) which when unzipped contains JSON in each line. Below is one sample JSON line. I am trying to extract specific fields only to CSV file using jq
. I want to extract these fields with a condition that the type
key should have the value dissertation
only.
{
"id": "https://openalex.org/W2777209504",
"doi": "https://doi.org/10.24026/1818-1384.1(42).2013.77470",
"display_name": "Hyperandrogenism as a factor of reproductive losses",
"title": "Hyperandrogenism as a factor of reproductive losses",
"publication_year": 2013,
"publication_date": "2013-03-27",
"ids": {
"openalex": "https://openalex.org/W2777209504",
"doi": "https://doi.org/10.24026/1818-1384.1(42).2013.77470",
"mag": 2777209504
},
"type": "journal-article",
"counts_by_year": [
{
"year": 2019,
"cited_by_count": 1
}
],
"cited_by_api_url": "https://api.openalex.org/works?filter=cites:W2777209504",
"updated_date": "2021-11-03",
"created_date": "2018-01-05",
"abstract_inverted_index": {}
}
I tried the below two commands and neither of them worked: \
gzcat -c sample.gz | jq -rc '[.doi,.title, .publication_year, .publication_date, .type] | select(.type |contains("dissertation")) | @csv'>target.csv
gzcat -c sample.gz | jq -rc '[.doi,.title, .publication_year, .publication_date, .type] | select(.type=="dissertation") | @csv'>target.csv
The output received for both of them is:
jq: error (at <stdin>:108753): Cannot index string with string "title"
I tried all possibles ways to filter down my JSON-LD file but I am unable to succeed. Any pointers will be of great help.
In both your attempts, the select
is incorrectly formulated (or in the wrong place, depending on your point of view). This would work:
select(.type == "dissertation")
| [.doi,.title, .publication_year, .publication_date, .type]
| @csv