I'm currently doing some API work with Tenable.io, and I'm having some trouble selecting substrings. I'm sending requests for scan histories, and the API responds with a continuous string of all scans in JSON format. The response I get is a very large continuous string of data, and I need to select some substrings (a few values), and copy that to a list (just for now). Getting data into a list isn't where I'm stuck - I require some serious assistance with selecting the substrings I need. Each scan has the following attributes:
Each of these has a value/boolean following it (see below). I need a way to extract the values following "id":
, "scan_uuid:"
, and "time_start":
from the string (and put it in a list just for now).
I'd like to do this without string.index
, as this may break the script if the response length changes. There is also a new scan everyday, so the overall length of the response will change. Due to the nature of the data, I'd imagine the ideal solution would be to specify a condition that will select x amount of characters after "id":
, "scan_uuid:"
, and "time_start":
, and append them to a list, with the output looking something like:
scan_id_10_response = ["12345678", ""15b6e7cd-447b-84ab-84d3-48a62b18fe6c", "1639111111", etc, etc]
String is below - I've only included the data for 4 scans for simplicity's sake. I've also changed the values for security reasons, but the length & format of the values are the same.
scan_id_10_response = '{"pagination":{"offset":0,"total":119,"sort":[{"order":"DESC","name":"start_date"}],"limit":100},"history":[\
{"id":12345678,"status":"completed","is_archived":false,"targets":{"custom":false,"default":null},"visibility":"public","scan_uuid":"15b6e7cd-447b-84ab-84d3-48a62b18fe6c","reindexing":null,"time_start":1639111111,"time_end":1639111166},\
{"id":23456789,"status":"completed","is_archived":false,"targets":{"custom":false,"default":null},"visibility":"public","scan_uuid":"8a468cff-c64f-668a-3015-101c218b68ae","reindexing":null,"time_start":1632222222,"time_end":1632222255},\
{"id":34567890,"status":"completed","is_archived":false,"targets":{"custom":false,"default":null},"visibility":"public","scan_uuid":"84ea995a-584a-cc48-e352-8742a38c12ff","reindexing":null,"time_start":1639333333,"time_end":1639333344},\
{"id":45678901,"status":"completed","is_archived":false,"targets":{"custom":false,"default":null},"visibility":"public","scan_uuid":"48a95366-48a5-e468-a444-a4486cdd61a2","reindexing":null,"time_start":1639444444,"time_end":1639444455}\
]}'
Basically you can use the standard json module to parse the json string.
Using that code snippet you obtain a dict you can then work with.
import json
c = json.loads(scan_id_10_response)
Now you can for example create a list of list with the desired attributes:
extracted_data = [[d['id'], d['scan_uuid'], d['time_start']] for d in c['history']]
This returns for this particular example:
[[12345678, '15b6e7cd-447b-84ab-84d3-48a62b18fe6c', 1639111111],
[23456789, '8a468cff-c64f-668a-3015-101c218b68ae', 1632222222],
[34567890, '84ea995a-584a-cc48-e352-8742a38c12ff', 1639333333],
[45678901, '48a95366-48a5-e468-a444-a4486cdd61a2', 1639444444]]
If you only want one result at a time use a generator or iterate over the list
gen_extracted = ([d['id'], d['scan_uuid'], d['time_start']] for d in x['history'])
If you dont want to work with a dict i would reccomend you a look into regular expressions.