https://bern.korea.ac.kr/pubmed/32818866
$ jq -r '.[] | .denotations | .[] | select(.obj=="drug") | .span | [.begin, .end] | @tsv'
I am able to extract the following info from the above URL using the above jq command.
377 387
562 579
584 602
659 676
681 699
919 936
941 959
1032 1049
1054 1072
But the output that I really need is the following.
The last column is just the substring of text
starting from begin+1
to end
(suppose the string in text
is indexed starting from 1.
I don't know how to extract this info using just jq
as it involves taking a parent sibling element and the substring of another parent sibling element. Could anybody show me how to extract the output in this format? Thanks.
32818866 377 387 silica gel
32818866 562 579 7-methoxycoumarin
32818866 584 602 8-prenylkaempferol
32818866 659 676 7-methoxycoumarin
32818866 681 699 8-prenylkaempferol
32818866 919 936 7-methoxycoumarin
32818866 941 959 8-prenylkaempferol
32818866 1032 1049 7-methoxycoumarin
32818866 1054 1072 8-prenylkaempferol
The json txt is here for the completeness of this message.
[
{
"project": "BERN",
"sourcedb": "PubMed",
"sourceid": "32818866",
"text": "Identification of two bitter components in Zanthoxylum bungeanum Maxim. and exploration of their bitter taste mechanism through receptor hTAS2R14. Bitterness is an inherent organoleptic characteristic affecting the flavor of Zanthoxylum bungeanum Maxim. In this study, the vital bitter components of Z. bungeanum were concentrated through solvent extraction, sensory analysis, silica gel chromatography, and thin-layer chromatographic techniques and subsequently identified by UPLC-Q-TOF-MS. Two components with the highest bitterness intensities (BIs), such as 7-methoxycoumarin and 8-prenylkaempferol were selected. The bitter taste perceived thresholds of 7-methoxycoumarin and 8-prenylkaempferol were 0.062 mmol/L and 0.022 mmol/L, respectively. Moreover, the correlation between the contents of the two bitter components and the BIs of Z. bungeanum were proved. The results of siRNA and flow cytometry showed that 7-methoxycoumarin and 8-prenylkaempferol could activate the bitter receptor hTAS2R14. The results concluded that 7-methoxycoumarin and 8-prenylkaempferol contribute to the bitter taste of Z. bungeanum.",
"denotations": [
{
"id": [
"NCBI:txid328401"
],
"span": {
"begin": 43,
"end": 64
},
"obj": "species"
},
{
"id": [
"CUI-less"
],
"span": {
"begin": 128,
"end": 145
},
"obj": "gene"
},
{
"id": [
"NCBI:txid328401"
],
"span": {
"begin": 225,
"end": 246
},
"obj": "species"
},
{
"id": [
"NCBI:txid328401"
],
"span": {
"begin": 300,
"end": 312
},
"obj": "species"
},
{
"id": [
"MESH:D058428",
"BERN:315272203"
],
"span": {
"begin": 377,
"end": 387
},
"obj": "drug"
},
{
"id": [
"CHEBI:5679",
"BERN:4597103"
],
"span": {
"begin": 562,
"end": 579
},
"obj": "drug"
},
{
"id": [
"MESH:C532177",
"BERN:280529003"
],
"span": {
"begin": 584,
"end": 602
},
"obj": "drug"
},
{
"id": [
"CHEBI:5679",
"BERN:4597103"
],
"span": {
"begin": 659,
"end": 676
},
"obj": "drug"
},
{
"id": [
"MESH:C532177",
"BERN:280529003"
],
"span": {
"begin": 681,
"end": 699
},
"obj": "drug"
},
{
"id": [
"NCBI:txid328401"
],
"span": {
"begin": 841,
"end": 853
},
"obj": "species"
},
{
"id": [
"CHEBI:5679",
"BERN:4597103"
],
"span": {
"begin": 919,
"end": 936
},
"obj": "drug"
},
{
"id": [
"MESH:C532177",
"BERN:280529003"
],
"span": {
"begin": 941,
"end": 959
},
"obj": "drug"
},
{
"id": [
"CUI-less"
],
"span": {
"begin": 979,
"end": 994
},
"obj": "gene"
},
{
"id": [
"CUI-less"
],
"span": {
"begin": 995,
"end": 1003
},
"obj": "gene"
},
{
"id": [
"CHEBI:5679",
"BERN:4597103"
],
"span": {
"begin": 1032,
"end": 1049
},
"obj": "drug"
},
{
"id": [
"MESH:C532177",
"BERN:280529003"
],
"span": {
"begin": 1054,
"end": 1072
},
"obj": "drug"
},
{
"id": [
"NCBI:txid328401"
],
"span": {
"begin": 1107,
"end": 1119
},
"obj": "species"
}
],
"timestamp": "Wed Oct 28 21:43:04 +0000 2020",
"logits": {
"disease": [],
"gene": [
[
{
"start": 128,
"end": 145,
"id": "CUI-less"
},
0.7066106796264648
],
[
{
"start": 979,
"end": 994,
"id": "CUI-less"
},
0.9999749660491943
],
[
{
"start": 995,
"end": 1003,
"id": "CUI-less"
},
0.9052715301513672
]
],
"drug": [
[
{
"start": 377,
"end": 387,
"id": "MESH:D058428\tBERN:315272203"
},
0.999982476234436
],
[
{
"start": 562,
"end": 579,
"id": "CHEBI:5679\tBERN:4597103"
},
0.9999980926513672
],
[
{
"start": 584,
"end": 602,
"id": "MESH:C532177\tBERN:280529003"
},
0.9999980926513672
],
[
{
"start": 659,
"end": 676,
"id": "CHEBI:5679\tBERN:4597103"
},
0.9999980926513672
],
[
{
"start": 681,
"end": 699,
"id": "MESH:C532177\tBERN:280529003"
},
0.9999980330467224
],
[
{
"start": 919,
"end": 936,
"id": "CHEBI:5679\tBERN:4597103"
},
0.9999980926513672
],
[
{
"start": 941,
"end": 959,
"id": "MESH:C532177\tBERN:280529003"
},
0.9999980926513672
],
[
{
"start": 1032,
"end": 1049,
"id": "CHEBI:5679\tBERN:4597103"
},
0.9999980926513672
],
[
{
"start": 1054,
"end": 1072,
"id": "MESH:C532177\tBERN:280529003"
},
0.9999980926513672
]
],
"species": [
[
{
"start": 43,
"end": 64,
"id": "NCBI:txid328401"
},
0.9999997615814209
],
[
{
"start": 225,
"end": 246,
"id": "NCBI:txid328401"
},
0.9999998211860657
],
[
{
"start": 300,
"end": 312,
"id": "NCBI:txid328401"
},
0.9999998211860657
],
[
{
"start": 841,
"end": 853,
"id": "NCBI:txid328401"
},
0.9999998211860657
],
[
{
"start": 1107,
"end": 1119,
"id": "NCBI:txid328401"
},
0.9999998211860657
]
]
},
"elapsed_time": {
"tmtool": 0.991,
"ner": 0.453,
"normalization": 0.172,
"total": 1.617
}
}
]
Assuming the first column of the desired output is the "sourceid", we can adapt your solution as follows:
.[]
| .sourceid as $id
| .text as $text
| .denotations[]
| select(.obj=="drug")
| .span
| [$id, .begin, .end, $text[.begin : .end] ]
| @tsv