regex with python

I want to extract a file, it should start with [{"linkId":"changeDriveLink" and finish by a text just befor ,"zone"

my input is:

[{"linkIdsd":"changeDridsdve [{"linkId":"changeDriveLink","url":"/drive
/3696434","zoneId":"forceAjax"},{"linkId":"printProductsFormSubst","url":"/drive/rayon.pagetemplate.substitutionlist.printproductsformsubst","zoneId":"forc
,"zone"

and i want to have:

[{"linkId":"changeDriveLink","url":"/drive
    /3696434","zoneId":"forceAjax"},{"linkId":"printProductsFormSubst","url":"/drive/rayon.pagetemplate.substitutionlist.printproductsformsubst","zoneId":"forc

how can i do this by regex please?

Solution

The regular expression

re.compile(r'^\[\{"linkId":"changeDriveLink".*,"zone"', re.DOTALL)

should do this. The .* in the middle represents any character, and the re.DOTALL makes sure, that even newlines are matched, in case your json is pretty-printed.

But I think it would be better, to load the file with the json package, and then check if it satisfies your requirements:

import json

with open('filename_here.json', 'r') as json_file:
    data = json.load(json_file)

if data[0]['linkId'] == 'changeDriveLink':
    # then its OK
else:
    # not OK

Based on the string you've given, your json is a list (array), and its first element is a dict, and the dict has a 'linkId' key with the value 'changeDriveLink'. This is what I check in the if statement.

EDIT:

Now I understand what you want to do. First, you should omit the ^ charachter from the beggining of the expression, since the string you provided is not the start of the json file, it should be the beginning of the result. Then, you can get the string you want with e.g. grouping:

pattern = re.compile(r'.*(?P<result>\[\{"linkId":"changeDriveLink".*),"zone"', re.DOTALL)
match_obj = pattern.match('your_json_string')
if match_obj is not None:
    the_string_you_want = match_obj.group('result')

What I used here is called named grouping, you can read more about in in the documentation