Is there a way to parse an email address by JSOUP which is protected by this piece of code:
<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="29484e4a404a4c50404469484e4a404a4c504044074a4644">[email protected]</a>
While parsing with standard elements.select(".email").text();
it returns [email protected]
.
I tried to google this but found a lot of unrelated info.
The email address is "encrypted" by XORing every character in the email address with some randomly generated first byte. Decode the hex string into a byte array and XOR all of the bytes with the first one to decrypt the address.
For example (in Python):
In [1]: cfemail = '29484e4a404a4c50404469484e4a404a4c504044074a4644'
In [2]: encoded_bytes = bytes.fromhex(cfemail)
In [3]: encoded_bytes
Out[3]: b')HNJ@JLP@DiHNJ@JLP@D\x07JFD'
In [4]: bytes(byte ^ encoded_bytes[0] for byte in encoded_bytes[1:])
Out[4]: b'[email protected]'