Search code examples
pythonweb-scrapingbeautifulsoup

How can I scrape data from a function which includes loops


I'm learning Web Scraping (with Python and Beautiful Soup) and I encountered with a problem on how to scrape data within a function which contains loops. The data I'm trying to get is within the condition of if,else statement as shown below.(Page Source below)

I want to scrap "Password : h7s6sh"

**<SCRIPT>
function passWord() {
var testV = 1;
var pass1 = prompt('Please Enter Your Password',' ');
while (testV < 3) {
if (!pass1) 
history.go(-1);
if (pass1.toLowerCase() == "ratedr") 
{
  alert('You Got it Right!');
  document.write("<center><h1>Username : [email protected]<p>Password :  h7s6sh</p></h1><p>NOTE : Visit daily Everyday</p><p><h1>Thank You!</h1></p></center>");break;
   } 
 testV+=1;
 var pass1 = prompt('Access Denied - Password Incorrect, Please Try Again.','Password');
}
if (pass1.toLowerCase()!="password" & testV ==3) 
history.go(-1);
return " ";
} 
</SCRIPT>**

This is the script I'm trying on

>>> script_mim.text

u'\nfunction passWord() {\nvar testV = 1;\nvar pass1 = prompt(\'Please Enter Your Password\',\' \');\nwhile (testV < 3) {\nif (!pass1) \nhistory.go(-1);\nif (pass1.toLowerCase() == "ratedr") {\nalert(\'You Got it Right!\');\ndocument.write("<center><h1>Username : [email protected]<p>Password : h7s6sh</p></h1><p>NOTE : Visit daily Everyday</p><p><h1>Thank You!</h1></p></center>");\nbreak;\n} \ntestV+=1;\nvar pass1 = \nprompt(\'Access Denied - Password enter code hereIncorrect, Please Try Again.\',\'Password\');\n}\nif (pass1.toLowerCase()!="password" & testV ==3) \nhistory.go(-1);\nreturn " ";\n} \n\n'

>>> script_mim.find_all('p')

[]

Why isn't anything being displayed? I'm using the latest version of python 3.x. Can you please tell me what I'm doing wrong, with a solution.


Solution

  • Beautiful soup will only understand and try to parse the tags not data inside those tags. Since the extracted content of <script> tag will be a unicode string you can't parse the result. So, You have to do string operations on the result in order to get the output. Either you can find the index of <p> and </p> and can extract that tag using list comprehension and reparse it using beautiful soup to get output (method 1) or You can do direct string operation on the result in order to get the output (method 2).

    1. If you go by reparsing the <p></p> tag extracted by string operation using BeautifulSoup then your code will be

      soup=BeautifulSoup(script_mim.text[272:script_mim.text.find('</h1>')],"html.parser") #reinitialize beautifulsoup by extarcting <p> tag
      soup.find("p").get_text() #this will give you desired output.
      
    2. If you go by string operations to parse data and get password then your code will be

      script_mim.text[script_mim.text.find('<p>')+3:script_mim.text.find('</h1>')] 
      

    I would recommend method 2 since string operation is inexpensive when compared to method 1.