I'm trying to scrape some profiles of people in linkedin from a specific job. To do this I was trying to find the people button and click it to specifically look at the relevant people.
The path is as follows:
From signed out Linkedin home -> I sign in and go to LinkedIn home -> I write in the search bar "hr" and hit enter.
In the result page of hr, on the left side of the page, there is a navigation list that says "On this page". One of the options includes "People" and that is what I want to target.
The link to the page is: https://www.linkedin.com/search/results/all/?keywords=hr&origin=GLOBAL_SEARCH_HEADER&sid=Xj2
The HTML of the button for 'People' in the navigation list is:
<li>
<button aria-current="false" class="search-navigation-panel_button" data-target-section-id="PTFmMNSPSz2LQRzwynhRBQ==" role="link" type="button"> People
I have tried to find this button through By.Link_text
and found the keyword People
but did not work. I have also tried to do By.XPATH "//button[@data-target-section-id='RIK0XK7NRnS21bVSiNaicw==']")""
but it also does not find it.
How can I make selenium find this custom attribute so I can find this button through data-target-section-id="PTFmMNSPSz2LQRzwynhRBQ=="?
Another issue that I am having is that I can target all the relevant people on the page and loop through them but I cannot extract the link of each of the profiles. It only takes the first link of the first person and never updates the variable again through the loop.
For example, if the first person is Ian, and the second is Brian, it gives me the link for Ian's profile even if 'users' is Brian.
Debugging the loop I can see the correct list of people in all_users but it only gets the href of the first person in the list and never updates.
Here is the code of that:
all_users = driver.find_elements(By.XPATH, "//*[contains(@class, 'entity-result__title-line entity-result__title-line--2-lines')]")
for users in all_users:
print(users)
get_links = users.find_element(By.XPATH, "//*[contains(@href, 'miniProfileUrn')]")
print(get_links.get_attribute('href'))
It looks like the reason your People button locator isn't working is because the data-target-section-id
is dynamic. Mine is showing as hopW8RkwTN2R9dPgL6Fm/w==
. We can get around that by using an XPath to locate the element based on the text contained, "People", e.g.
//button[text()='People']
Turns out that matches two elements on the page because many of the left nav links are repeated as rounded buttons on the top of the page so we can further refine our locator to
//button[text()='People'][@data-target-section-id]
Having said that, that link only scrolls the page so you don't really need to click that.
From there, you want to get the links to each person listed under the People heading. We first need the DIV that contains the People section. It's kinda messy because the IDs on those elements are also dynamic so we need to find the H2 that contains "People" and then work our way back up the DOM to the DIV that contains only that section. We can get that using the XPath below
//div[@class='search-results-container']/div[.//h2[text()='People']]
From there, we want all of the A tags that uniquely link to a person... and there's a lot of A tags in that section but most are not ones we want so we need to do more filtering. I found that the below XPath locates each unique URL in that section.
//a[contains(@href,'miniProfileUrn')][contains(@class,'scale-down')]
Combining the two XPaths, we get
//div[@class='search-results-container']/div[.//h2[text()='People']]//a[contains(@href,'miniProfileUrn')][contains(@class,'scale-down')]
which locates all unique URLs belonging to a person in the People section of the page.
Using this, your code would look like
all_users = driver.find_elements(By.XPATH, "//div[@class='search-results-container']/div[.//h2[text()='People']]//a[contains(@href,'miniProfileUrn')][contains(@class,'scale-down')]")
for user in all_users:
print(user.get_attribute('href'))
NOTE: The reason your code was only returning the first href repeatedly is because you are searching from an existing element with an XPath so you need to add a "." at the start of the XPath to indicate to start searching from the referenced element.
get_links = users.find_element(By.XPATH, ".//*[contains(@href, 'miniProfileUrn')]")
^ add period here
I've eliminated that step in my code so you won't need it there.