The title explains all, also, I have tried removing them
(because the text is there, but instead of "aldo" there is "al?do", also it seems to have a random pattern)
with (String).replace("?", "")
, but with no success.
I have also used this, with a combination of UTF_8,UTF_16 and ISO-8859, with no success.
byte[] ptext = tempName.getBytes(UTF_8);
String tempName1 = new String(ptext, UTF_16);
An example of what I am getting:
Studded Regular Sweatshirt // Instead of this
S?tudde?d R?eg?ular? Sw?eats?h?irt // I get this
Could it be the website that notices the headless browser and tries to "spoof" its content? How can I overcome this?
It looks very likely that site you scrapping intent mix up the 3f
and 64
characters into your result.
so you have to mask your self as a normal browser to scrapping or filter it out by replacing.
text simple
Sca???rfa???ce??? E???mbr???oi�d???ered L�e???athe
after filteration
Scarface Embroidered Leather
//Sca???rfa???ce??? E???mbr???oi�d???ered L�e???athe
//Scarface Embroidered Leathe
String hex="5363613f3f3f7266613f3f3f63653f3f3f20453f3f3f6d62723f3f3f6f69643f3f3f65726564204c653f3f3f61746865";
byte[] bytes= hexStringToBytes(hex);
//the only line you need
String res = new String(bytes,"UTF-8").replaceAll("\\\u003f","").replaceAll('�',"").replaceAll("�","");
private static byte charToByte(char c) {
return (byte) "0123456789ABCDEF".indexOf(new String(c));
}
public static byte[] hexStringToBytes(String hexString) {
if (hexString == null || hexString.equals("")) {
return null;
}
hexString = hexString.toUpperCase();
int length = hexString.length() / 2;
char[] hexChars = hexString.toCharArray();
byte[] d = new byte[length];
for (int i = 0; i < length; i++) {
int pos = i * 2;
d[i] = (byte) (charToByte(hexChars[pos]) << 4 | charToByte(hexChars[pos + 1]));
}
return d;
}
public static String bytesToHexString(byte[] src){
StringBuilder stringBuilder = new StringBuilder("");
if (src == null || src.length <= 0) {
return null;
}
for (int i = 0; i < src.length; i++) {
int v = src[i] & 0xFF;
String hv = Integer.toHexString(v);
if (hv.length() < 2) {
stringBuilder.append(0);
}
stringBuilder.append(hv);
}
return stringBuilder.toString();
}
public String printHexString( byte[] b) {
String a = "";
for (int i = 0; i < b.length; i++) {
String hex = Integer.toHexString(b[i] & 0xFF);
if (hex.length() == 1) {
hex = '0' + hex;
}
a = a+hex;
}
return a;
}