Search code examples
jsoncmdxidel

How to extract exact values from a json file with xidel?


Excuse my English, I am not a native speaker

I'm new to this so I don't know much

I am trying to extract some values from a json file with xidel with the following command in windows cmd but it's not working

xidel MyFile.json -e '$json//options/option/*[@option_id="D-ES"]/content_id'

Generally the json file has three options, English, Spanish and Portuguese, I only want all the values related to Spanish

I want to extract the following values

"group_id": "******",                                       
"content_id": "******",                                     
"current_content": "*****",                                     
"option_id": "D-ES",                                                                        
"subtitle": *****,                                                                              
"id": "ES",                                     
"desc": "Español",

And put the extracted values as follows

"group_id"-"*****","content_id"-"*****","current_content"-"*****","option_id"-"D-ES"-"subtitle"- *****,"id"- "ES""desc"- "Español",

This is part of my json file

{
  "original": {
    "id": "ING",
    "desc": "Inglés"
  },
  "dubbed": "true",
  "subbed": "false",
  "options": {
    "option": [
      {
        "group_id": "922450",
        "content_id": "284951",
        "current_content": "false",
        "option_id": "D-ES",
        "audio": "ES",
        "subtitle": null,
        "option_name": "dubbed",
        "id": "ES",
        "desc": "Español",
        "label_short": "Dob. Español",
        "label_large": "Doblada al Español",
        "intro_start_time": null,
        "intro_finish_time": null,
      },
      {
        "group_id": "275495",
        "content_id": "243856",
        "current_content": "false",
        "option_id": "D-PT",
        "audio": "PT",
        "subtitle": null,
        "option_name": "dubbed",
        "id": "PT",
        "desc": "Portugués",
        "label_short": "Dob. Portugués",
        "label_large": "Doblada al Portugués",
        "intro_start_time": null,
        "intro_finish_time": null,
      },
      {
        "group_id": "248954",
        "content_id": "245238",
        "current_content": "false",
        "option_id": "O-EN",
        "audio": "ORIGINAL",
        "subtitle": null,
        "option_name": "original",
        "id": "EN",
        "desc": "Inglés",
        "label_short": "Id. Inglés",
        "label_large": "Idioma Original Inglés",
        "intro_start_time": null,
        "intro_finish_time": null,
      }
    ]
  }
}

What command should I use to extract the values related to Spanish?


Solution

  • xidel MyFile.json -e '$json//options/option/*[@option_id="D-ES"]/content_id'
    
    • It is generally advised to swap the quotes if you're using the Windows binary. This quoting style is for Linux.
    • To navigate the "option"-array use (option)() (or alternatively the XQuery 3.1 syntax option?*).
    • The @ is only necessary if your input is HTML/XML.

    So the correct query would be -e "$json//options/(option)()[option_id='D-ES']/content_id".

    I want to extract the following values [...]

    xidel -s MyFile.json -e "$json//options/(option)()[option_id='D-ES']/(group_id,content_id,current_content,option_id,subtitle,id,desc)"
    922450
    284951
    false
    D-ES
    ES
    Español
    

    To include the attribute names I would do:

    xidel -s MyFile.json -e "for $x in ('group_id','content_id','current_content','option_id','subtitle','id','desc') return $json//options/concat($x,' - ',(option)()[option_id='D-ES']($x))"
    group_id - 922450
    content_id - 284951
    current_content - false
    option_id - D-ES
    subtitle
    id - ES
    desc - Español
    

    If you really want the surrounding double quotes, you can just add them (escaped with a backslash) in the concat()-function...

    -e ".../concat('\"',$x,'\"-\"',(option)()[option_id='D-ES']($x),'\"')"
    

    ...or you can use xidel's "Extended Strings" syntax...

    -e ".../x'\"{$x}\"-\"{(option)()[option_id='D-ES']($x)}\"'"
    

    ...or use the XQuery notation for a double quote...

    --xquery ".../x'"{$x}"-"{(option)()[option_id='D-ES']($x)}"'"
    

    The output:

    "group_id"-"922450"
    "content_id"-"284951"
    "current_content"-"false"
    "option_id"-"D-ES"
    "subtitle"-""
    "id"-"ES"
    "desc"-"Español"
    

    And finally to turn this sequence into a single line where each item is separated by a ,:

    xidel -s MyFile.json -e "join(for $x in ('group_id','content_id','current_content','option_id','subtitle','id','desc') return $json//options/x'\"{$x}\"-\"{(option)()[option_id='D-ES']($x)}\"',',')"
    "group_id"-"922450","content_id"-"284951","current_content"-"false","option_id"-"D-ES","subtitle"-"","id"-"ES","desc"-"Español"
    

    The finally command/query prettified:

    -e "
      join(
        for $x in (
          'group_id','content_id','current_content',
          'option_id','subtitle','id','desc'
        ) return
        $json//options/x'\"{$x}\"-\"{(option)()[option_id='D-ES']($x)}\"',
        ','
      )
    "