I want to extract a file, it should start with [{"linkId":"changeDriveLink"
and finish by a text just befor ,"zone"
my input is:
[{"linkIdsd":"changeDridsdve [{"linkId":"changeDriveLink","url":"/drive
/3696434","zoneId":"forceAjax"},{"linkId":"printProductsFormSubst","url":"/drive/rayon.pagetemplate.substitutionlist.printproductsformsubst","zoneId":"forc
,"zone"
and i want to have:
[{"linkId":"changeDriveLink","url":"/drive
/3696434","zoneId":"forceAjax"},{"linkId":"printProductsFormSubst","url":"/drive/rayon.pagetemplate.substitutionlist.printproductsformsubst","zoneId":"forc
how can i do this by regex please?
The regular expression
re.compile(r'^\[\{"linkId":"changeDriveLink".*,"zone"', re.DOTALL)
should do this. The .*
in the middle represents any character, and the re.DOTALL
makes sure, that even newlines are matched, in case your json is pretty-printed.
But I think it would be better, to load the file with the json
package, and then check if it satisfies your requirements:
import json
with open('filename_here.json', 'r') as json_file:
data = json.load(json_file)
if data[0]['linkId'] == 'changeDriveLink':
# then its OK
else:
# not OK
Based on the string you've given, your json is a list
(array), and its first element is a dict
, and the dict
has a 'linkId'
key with the value 'changeDriveLink'
. This is what I check in the if
statement.
EDIT:
Now I understand what you want to do.
First, you should omit the ^
charachter from the beggining of the expression, since the string you provided is not the start of the json file, it should be the beginning of the result.
Then, you can get the string you want with e.g. grouping:
pattern = re.compile(r'.*(?P<result>\[\{"linkId":"changeDriveLink".*),"zone"', re.DOTALL)
match_obj = pattern.match('your_json_string')
if match_obj is not None:
the_string_you_want = match_obj.group('result')
What I used here is called named grouping, you can read more about in in the documentation