So I have this href element and im trying to only print the number of what's inside the href, however path of the href after the element also includes numbers so i'm not really sure how exactly how to grab only the numbers, as oppose to also printing the numbers in what you see as Mad1000
href https://www.game.com/items/20573078/Mad1000
userLink = driver.find_element_by_xpath(f"//*[@id='bc_owners_table']/tbody/tr[{i+1}]/td[7]/a").get_attribute("href")
userID = re.sub('[^0-9]', '', userLink)
print(userID)
the outcome ends up being 205730781000 but im trying to navigate to where i can only print 20573078, how would i achieve this
There are 4 good ways to do this:
userID = [int(s) for s in href.split("/") if s.isdigit()]
print(userID[0])
userID = re.findall(r'\d+', href)
print(userID[0])
userID = href.split("/")[4]
print(userID)
userID = re.sub('[^0-9]', '', href)[:-4]
print(userID)
Let me explain. PS: I used the href
variable, but you can change it to userLink
and it should work.
The first method splits the string into a list everytime there is a /
. It then checks to see if a value is an interger for every item in the list. This is returned as a list, so we use userID[0]
to get the first (and usually only!) element in the list. The reason Mad1000
will not be in the list is because it consists of a string AND integer. The list will only contain integers.
The second method returns a list of EVERY number in the string as a list. Therefore, this time, 1000
will be added because it's a number, Therefore, we use userID[0]
to get the first element of the list, which will be 20573078
because there aren't any numbers before it (there may be however if the href changes.)
The third method splits the string into a list again by /
. The difference is that this time, we get the 4th element of the list straight away. You might need to play around because, depending on the hyperlink, you might need to access the 3rd or 5th element instead. This is an alternative to option 1, which is similar to this, but also checks if the value is a number.
The final 1 gets the number using your method, but removes the last 4 values using [:-4]
.
None of these methods are perfect, but they should work for what you want.