I'm trying to extract a json from a script tag using regular expression, but I can't get it to match - however, my pattern works on https://regex101.com/ (using the page source to match against).
import requests
import re
req = requests.get(myURL)
matches = re.findall("/reports=\[([^]]+)\]/g", req.text)
print(matches)
The start of the json looks like this:
/*! jQuery v1.10.2 | (c) 2005, 2013 jQuery Foundation, Inc. | jquery.org/license
//@ sourceMappingURL=jquery.min.map
*/
(function(e,t){var n,r,i=typeof t,o=e.location,a=e.document,s=a.documentElement,l=e.jQuery,u=e.$,c={},p=...
var reports=[
{
"Id": "ddb56456-ae7e-46da-8251-97630e1536f7",
Any pointers on what I'm doing wrong? If I write req.text to a text file then copy it into regex101.com I can match against it using the same pattern above.
You don't use the "slash" delimiters when you're specifying a string like this. Plus, the "g" flag is implied by findall
. Just use:
matches = re.findall("reports=\[([^]]+)\]", req.text)