Search code examples
pythondictionaryscrapyhtml-parsing

How to parse a html string using python scrapy


I have a list of html input elements as below.

lists=[<input type="hidden" name="csrf_token" value="jZdkrMumEBeXQlUTbOWfInDwNhtVHGSxKyPvaipoAFsYqCgRLJzc">,
<input type="text" class="form-control" id="username" name="username">,
<input type="password" class="form-control" id="password" name="password">,
<input type="submit" value="Login" class="btn btn-primary">]

From these I need to extract the attribute values of name, type, and value

For eg: Consider the input <input type="hidden" name="csrf_token" value="jZdkrMumEBeXQlUTbOWfInDwNhtVHGSxKyPvaipoAFsYqCgRLJzc"> then I need output as following dictionary format {'csrf_token':('hidden',"jZdkrMumEBeXQlUTbOWfInDwNhtVHGSxKyPvaipoAFsYqCgRLJzc")}

Could anyone please a guidance to solve this


Solution

  • I recommend you to use the Beautiful Soup Python library (https://pypi.org/project/beautifulsoup4/) to get the HTML content and the values of the elements. There are functions already created for that purpose.