I would like to extract emails with rvest from this link However there is a javascript that masked the mailto href
How can I improve the following code?
uni<- c("https://uni-tuebingen.de/fakultaeten/philosophische-fakultaet/fachbereiche/asien-orient-wissenschaften/indologie/mitarbeiter/")
r<-read_html(uni)
a <- r %>%
html_nodes("a") %>%
html_attrs() %>%
as.character() %>%
str_subset("mailto:") %>%
str_remove("mailto:")
Thanks in advance
def decryptCharcode(n, start, end, offset):
n = ord(n) + offset
if (offset > 0 and n > end):
n = start + (n - end - 1)
elif (offset < 0 and n < start):
n = end - (start - n - 1)
return ''.join(map(chr, [n]))
def decryptString(enc, offset):
dec = ""
length = len(enc)
for i in range(length-3):
n = enc[i]
if (0x2B <= ord(n) <= 0x3A):
dec += decryptCharcode(n, 0x2B, 0x3A, offset)
elif 0x40 <= ord(n) <= 0x5A:
dec += decryptCharcode(n, 0x40, 0x5A, offset)
elif (0x61 <= ord(n) <= 0x7A):
dec += decryptCharcode(n, 0x61, 0x7A, offset)
else:
dec += enc[i]
return dec
email = "%27ocknvq%2Cuvqemgt0ygtpgtBdnwgykp0ej%27"
if "%27ocknvq%2C" in email:
email = email.replace("%27ocknvq%2C","")
email = decryptString(email,-2)
if "%3A%0D" in email:
email=email.replace("%3A%0D","-")
print(email)
I converted the JS code to python. Reference: https://gist.github.com/InsanityMeetsHH/c38f513f28d6f9b778912f110c565348