Search code examples
javascriptpythonwebrequestscreen-scraping

Trouble in Getting Phone Number While Parsing the data inside the script


Code

import requests
from bs4 import BeautifulSoup as bs
my_url='https://www.olx.com.pk/item/oppo-f17-pro8128-iid-1034320813'


with requests.session() as s:
    r=s.get(my_url)
    page_html=bs(r.content,'html.parser')
    safe=page_html.findAll('script')
    print("The Length if Script is {0}:".format(len(safe)))
    for i in safe:
        if "+92" in str(i):
             print(i)

Query

Image 1

Image 2

I Want To Get that phone number that is actually present in windows.state using python script but I donot know how to parse the window.state.Will be very Thankful If you assist me that problem. Thanks in Advance!


Solution

  • As I have mentioned in the comments, the window.state is present inside the 7th <script> tag.

    I extracted the contents of the script tag and did a string search for phoneNumber, found it's index and was able to get the data that you need.

    Extracting data from JSON would be easier but the data isn't in JSON format.

    import bs4 as bs
    import requests
    
    url = 'https://www.olx.com.pk/item/oppo-f17-pro8128-iid-1034320813'
    resp = requests.get(url)
    
    # Convert the response text to HTML soup object
    soup = bs.BeautifulSoup(resp.text, 'html.parser')
    
    # Select the 7th script tag (that is where the data you need is present)
    s = soup.findAll('script')[6]
    
    # Extract the contents of script. This will be a string type.
    f = s.contents[0]
    
    # Find the index of substring "phoneNumber" - the data that you need.
    idx = f.index('phoneNumber')
    
    # Since you need the phone number, use string slicing and extract the data.
    print(f[idx-1: idx + 28])
    
    
    # Output
    
    "phoneNumber":"+923077250739"