Search code examples
pythonselectsplitsubstringextract

Select Substring from Larger String and Append to List


I'm currently doing some API work with Tenable.io, and I'm having some trouble selecting substrings. I'm sending requests for scan histories, and the API responds with a continuous string of all scans in JSON format. The response I get is a very large continuous string of data, and I need to select some substrings (a few values), and copy that to a list (just for now). Getting data into a list isn't where I'm stuck - I require some serious assistance with selecting the substrings I need. Each scan has the following attributes:

  • id
  • status
  • is_archived
  • targets
  • scan_uuid
  • reindexing
  • time_start (unix format)
  • time_end (unix format)

Each of these has a value/boolean following it (see below). I need a way to extract the values following "id":, "scan_uuid:", and "time_start": from the string (and put it in a list just for now).

I'd like to do this without string.index, as this may break the script if the response length changes. There is also a new scan everyday, so the overall length of the response will change. Due to the nature of the data, I'd imagine the ideal solution would be to specify a condition that will select x amount of characters after "id":, "scan_uuid:", and "time_start":, and append them to a list, with the output looking something like:

scan_id_10_response = ["12345678", ""15b6e7cd-447b-84ab-84d3-48a62b18fe6c", "1639111111", etc, etc]

String is below - I've only included the data for 4 scans for simplicity's sake. I've also changed the values for security reasons, but the length & format of the values are the same.

scan_id_10_response = '{"pagination":{"offset":0,"total":119,"sort":[{"order":"DESC","name":"start_date"}],"limit":100},"history":[\
{"id":12345678,"status":"completed","is_archived":false,"targets":{"custom":false,"default":null},"visibility":"public","scan_uuid":"15b6e7cd-447b-84ab-84d3-48a62b18fe6c","reindexing":null,"time_start":1639111111,"time_end":1639111166},\
{"id":23456789,"status":"completed","is_archived":false,"targets":{"custom":false,"default":null},"visibility":"public","scan_uuid":"8a468cff-c64f-668a-3015-101c218b68ae","reindexing":null,"time_start":1632222222,"time_end":1632222255},\
{"id":34567890,"status":"completed","is_archived":false,"targets":{"custom":false,"default":null},"visibility":"public","scan_uuid":"84ea995a-584a-cc48-e352-8742a38c12ff","reindexing":null,"time_start":1639333333,"time_end":1639333344},\
{"id":45678901,"status":"completed","is_archived":false,"targets":{"custom":false,"default":null},"visibility":"public","scan_uuid":"48a95366-48a5-e468-a444-a4486cdd61a2","reindexing":null,"time_start":1639444444,"time_end":1639444455}\
]}'

Solution

  • Basically you can use the standard json module to parse the json string.
    Using that code snippet you obtain a dict you can then work with.

    import json
    c = json.loads(scan_id_10_response)
    

    Now you can for example create a list of list with the desired attributes:

    extracted_data = [[d['id'], d['scan_uuid'], d['time_start']] for d in c['history']]
    

    This returns for this particular example:

    [[12345678, '15b6e7cd-447b-84ab-84d3-48a62b18fe6c', 1639111111], 
     [23456789, '8a468cff-c64f-668a-3015-101c218b68ae', 1632222222], 
     [34567890, '84ea995a-584a-cc48-e352-8742a38c12ff', 1639333333], 
     [45678901, '48a95366-48a5-e468-a444-a4486cdd61a2', 1639444444]]
    

    If you only want one result at a time use a generator or iterate over the list

    gen_extracted = ([d['id'], d['scan_uuid'], d['time_start']] for d in x['history'])
    

    If you dont want to work with a dict i would reccomend you a look into regular expressions.