Search code examples
pythonangularselenium-webdriverxpathwebdriverwait

Scraping Angular data with Selenium


Below there is some html that I can extract Text with Selenium driver.

<td colspan="2"><strong>Owner</strong>
                <div ng-class="{'owner-overflow' : property.ownersInfo.length > 4}">
                    <!-- ngRepeat: owner in property.ownersInfo --><div ng-repeat="owner in property.ownersInfo" class="ng-scope">
                        <div class="ng-binding">ERROL P BROWN LLC 
                            <!-- &nbsp;&nbsp; <span ng-if="owner.shortDescription != null && owner.shortDescription.length > 0">({{owner.shortDescription}})</span> -->
                        </div>
                    </div><!-- end ngRepeat: owner in property.ownersInfo -->
                </div>
            </td>    

<td colspan="2" class="pi_mailing_address"><strong>Mailing Address</strong>
                <div>
                    <span class="ng-binding">1784 NE 163 ST </span>
                    <span ng-show="property.mailingAddress.address2" class="ng-binding ng-hide"></span>
                    <span ng-show="property.mailingAddress.address3" class="ng-binding ng-hide"></span>
                    <span ng-show="property.mailingAddress.city" ng-class="{'inline':property.mailingAddress.city}" class="ng-binding inline">NORTH MIAMI,</span>
                    <span class="inline ng-binding">FL</span>
                    <span class="inline ng-binding">33162</span>
                    <span ng-hide="isCountryUSA(property.mailingAddress.country)" class="ng-binding ng-hide">USA</span>
                </div>
            </td>

When I run the code manually all the fields get picked up no issue. How ever if I run the script in a loop to extract this data These elements are blank. I am collecting other fields as well they are not coming up blank. There is no error in processing. Its just that when I save the data to a database these values are coming up empty. Is there some work around to have this NOT happen?

These are the lines of code:

Owner = driver.find_element(By.XPATH, "//strong[text()='Owner']//following::div[1]").text
SubDivision = driver.find_element(By.XPATH, "//strong[text()='Sub-Division:']//following::div[1]").text
Address1 = driver.find_element(By.XPATH, "//strong[text()='Mailing Address']//following::div[1]//following::span[1]").text
Address2 = driver.find_element(By.XPATH, "//strong[text()='Mailing Address']//following::div[1]//following::span[2]").text
Address3 = driver.find_element(By.XPATH, "//strong[text()='Mailing Address']//following::div[1]//following::span[3]").text
city = driver.find_element(By.XPATH, "//strong[text()='Mailing Address']//following::div[1]//following::span[4]").text.replace(",", "")
state = driver.find_element(By.XPATH, "//strong[text()='Mailing Address']//following::div[1]//following::span[5]").text
zipcode = driver.find_element(By.XPATH, "//strong[text()='Mailing Address']//following::div[1]//following::span[6]").text

Solution

  • Incase you are able to extract the required texts in standalone execution but not in a loop that may be due to race conditions that occur between the browser and the user's instructions.


    Solution

    As the elements are angular elements, so to extract the texts ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use the following locator strategies:

    Owner = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//strong[text()='Sub-Division:']//following::div[1]"))).text
    SubDivision = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//strong[text()='Sub-Division:']//following::div[1]"))).text
    Address1 = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//strong[text()='Mailing Address']//following::div[1]//following::span[1]"))).text
    Address2 = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//strong[text()='Mailing Address']//following::div[1]//following::span[2]"))).text
    Address3 = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//strong[text()='Mailing Address']//following::div[1]//following::span[3]"))).text
    city = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//strong[text()='Mailing Address']//following::div[1]//following::span[4]"))).text.replace(",", "")
    state = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//strong[text()='Mailing Address']//following::div[1]//following::span[5]"))).text
    zipcode = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//strong[text()='Mailing Address']//following::div[1]//following::span[6]"))).text
    

    Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

    You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python