I want to parse the url generated as the result of search querying into its components.
The following urls is generated as a result of searching on goolge search with the query phrase "url".
How can I categorize into url "www.google.com" plus "query":"url?
Are there packages in python does this parsing?
Output:{ url:"www.google.com", "query":"url"}
You can simply use the urllib.parse
module.
>>> from urllib.parse import urlparse, parse_qs
>>>
>>>
>>> url = "https://www.google.com/search?q=url&rlz=1C5GCEM_enIN1101IN1101&oq=url&gs_lcrp=EgZjaHJvbWUyFAgAEEUYORhDGIMBGLEDGIAEGIoFMg4IARBFGCcYOxiABBiKBTIMCAIQABhDGIAEGIoFMgYIAxBFGDwyBggEEEUYPDIGCAUQRRg8MgYIBhBFGDwyBggHEEUYPNIBCDU1NjhqMGo3qAIAsAIA&sourceid=chrome&ie=UTF-8 "
>>>
>>> r = urlparse(url)
>>> {"url": r.hostname, "query": parse_qs(r.query).get('q')}
{'url': 'www.google.com', 'query': ['url']}