Search code examples
pythondelphicharacter-encodingescaping

Can anyone identify this encoding?


Im working on a program to interface with some hardware that is sending data that has been encoded and wrapped to send within a CDATA block in an XML document.

the software in the device as far as I know is written in Python and i'm writing the interface in Delphi.

the data the device sends is this

\x00E\x18\x10\x14}*UF!A\x81\xac\x08&\x02\x01\n\x15\x1a\xc2PP\x92\x17\xc2\xc1\xa0\x0e\x1a\xc2\xd0K\x94\'\x830\x11\x8b \x84a_\xa0+\x04\x81\x17\x89\x15D\x91B\x05.\x84\xf1\x1b\x89%E\x00\x04\x9c\x0e\xc5\xc1=\x87\x0bE\xf18\x07\x1f\xc8a\xa5\x95\x08H\x80?\x84\x18\tPK\x8a$\t\xf1\xb2\x8e(J\xb0\x08\x91\x1eJ\xf0W\x0c-\x0b\xf0\x0e\x88\x07\x0c\x00\x9b\n \x910Z\x06!\x92\xf0W\x073S \x08\x87\xff\xff\xff\xf0\x0e\xff\xff\xff\xff\xff\xf3\x10\x0e\xba\xff\xff\xff\xf4C \xed\xbb\xb9_\xffDD1\r\xcb\xbaw\xf5TD2\xed\xbb\xba\x88EUDB\x0c\xba\xaa\x99UUDB\x0c\xba\xaa\xa9UUD2\r\xbb\xaa\xaaUTD2\r\xcb\xbb\xaaUTC!\r\xcb\xbb\xbbUD3!\x0e\xdc\xbb\xbbDD3!\x0e\xdc\xcc\xbbDC2!\x0e\xdc\xcc\xcc33"\x11\x0e\xdd\xcc\xccC3"\x11\x0e\xed\xdc\xcc\xf33!\x10\x0e\xee\xdd\xcc\xf32!\x10\x0e\xee\xdd\xdc\xff2!\x10\x00\xee\xee\xdd\xff\xf2!\x11\x00\x0e\xee\xdd\xff\xf2!\x11\x10\x0e\xee\xef\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00

I need to be able to send similar data back to the device but the format i have it in is this

0x451C0E148A4A65781B0291080E1644B0680B340580A28615C9001E8F1EC9F0559D260A4147901A0AF16D93304BC09A8523CC513E25218CA00CD42C0CE137891FCDB02397054DD07C04124E112408158E5124841E0ED17F8E28CEE12C96284F511B231C8FB07C1228D09079BD31D090960B2050B075871CD1217B8D171131830B3552509A8E295271621D2E9271AD972ED371AB93FFFCDFFFFFFFFFFFFCDDFFFFFFFFFFBCCDE0122FFFFFBCCDE01123FFFBBBCDE011234FFAABCDE001233FFAABCCE001234FFAAABCDE01234FFAAABCDE01234FFAAAABCE01344FAAA99ABE12344FAAA99ABE124555FAA99AAC044555FAA9999A96655FFAA9989998765FFAA98899BB855FFAA9999ABBD45FFAA9999ABCD34FAAA999AABCD345FAAA999BBCD33F000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

I know that \x is usually used to represent ascii chars using their hex values in 2 digit pairs but looking at the data this not the case. i'm struggling to identify the encoding used and the manufacturer is not providing much help.

what I want to know is how do I convert the hex I have to the format they are using in Delphi xe4?

BTW the two block do not contain the same data but it is the same type of data ie the format is the same just different encoding

example of the data being sent

POST ******** HTTP/1.1 Host: 172.16.1.136:8080 Accept-Encoding: identity Content-Length: 1552 Content-Type: text/xml Authorization: 1344354:PASS User-Agent: *********

<?xml version="1.0" encoding="utf-8"?> <Biometrics>   <Templates>
     <Template badge="1075" readerType="6" index="6" ts="2014-11-06T17:28:40.000+01:00" chk="3a6a4924ec04e668186b15e244e6fe73">   <![CDATA[     ['1075_6',
1415294920.3754971, [0, 0], [['3\x04\x00\x00\x00P\x00\x00E\x18\x10\x14}*UF!A\x81\xac\x08&\x02\x01\n\x15\x1a\xc2PP\x92\x17\xc2\xc1\xa0\x0e\x1a\xc2\xd0K\x94\'\x830\x11\x8b \x84a_\xa0+\x04\x81\x17\x89\x15D\x91B\x05.\x84\xf1\x1b\x89%E\x00\x04\x9c\x0e\xc5\xc1=\x87\x0bE\xf18\x07\x1f\xc8a\xa5\x95\x08H\x80?\x84\x18\tPK\x8a$\t\xf1\xb2\x8e(J\xb0\x08\x91\x1eJ\xf0W\x0c-\x0b\xf0\x0e\x88\x07\x0c\x00\x9b\n \x910Z\x06!\x92\xf0W\x073S \x08\x87\xff\xff\xff\xf0\x0e\xff\xff\xff\xff\xff\xf3\x10\x0e\xba\xff\xff\xff\xf4C \xed\xbb\xb9_\xffDD1\r\xcb\xbaw\xf5TD2\xed\xbb\xba\x88EUDB\x0c\xba\xaa\x99UUDB\x0c\xba\xaa\xa9UUD2\r\xbb\xaa\xaaUTD2\r\xcb\xbb\xaaUTC!\r\xcb\xbb\xbbUD3!\x0e\xdc\xbb\xbbDD3!\x0e\xdc\xcc\xbbDC2!\x0e\xdc\xcc\xcc33"\x11\x0e\xdd\xcc\xccC3"\x11\x0e\xed\xdc\xcc\xf33!\x10\x0e\xee\xdd\xcc\xf32!\x10\x0e\xee\xdd\xdc\xff2!\x10\x00\xee\xee\xdd\xff\xf2!\x11\x00\x0e\xee\xdd\xff\xf2!\x11\x10\x0e\xee\xef\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']]] ]]> </Template>

  </Templates> </Biometrics> HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Content-Type: text/xml;charset=ISO-8859-1 Transfer-Encoding: chunked Date: Thu, 06 Nov 2014 17:28:41 GMT

52 <?xml version="1.0" encoding="UTF-8"?><OperationStatus uid="">OK</OperationStatus> 0

These are biometric templates used by the Suprema Reader if that helps.

Solution

I have successfully deciphered what is going on with this now. to turn my original hex string into the required format I'm using this code, hope this helps someone else in the future. Please feel free to comment and make suggestions to improve the code.

class function TConvert.HexToPythonEscAscii(const aHexString: string): string;
var
  i: Integer;
  ByteArray: array of Byte;
begin
  Result := '';

  SetLength(ByteArray, (length(aHexString) div 2) );

  TConvert.HexToBytes(aHexString, ByteArray, length(ByteArray));

  for i := Low(ByteArray) to High(ByteArray) do
  begin
    if ByteArray[i] in [$20..$7E] then
    begin

      case ByteArray[i] of
        $5c : Result := Result +'\\';
        $27 : Result := Result +'\''';
      else
        Result := Result + char(ByteArray[i])
      end;

    end
    else
    begin

      case ansichar(ByteArray[i]) of
        TAB : Result :=  Result + '\t';
        LF  : Result :=  Result + '\n';
        CR  : Result :=  Result + '\r';
      else
        Result :=  Result + '\x' + LowerCase(IntToHex(ByteArray[i], 2));
      end;

    end;
  end;
end;

Solution

  • This looks like binary data held in a Python bytes object. Loosely, bytes that map to printable ASCII characters are presented as those ASCII characters. All other bytes are encoded \x** where ** is the hex representation of the byte.

    >>> b = b'\x00E\x18\x10\x14}*UF!A\x81\xac\x08&\x02\x01\n\x15\x1a\xc2PP\x92'
    >>> str(b)
    '\x00E\x18\x10\x14}*UF!A\x81\xac\x08&\x02\x01\n\x15\x1a\xc2PP\x92'
    >>> ord(b[0])
    0
    >>> ord(b[1])
    69
    >>> ord(b[2])
    24
    >>> ord(b[3])
    16
    >>> ord(b[4])
    20
    >>> ord(b[5])
    125
    >>> ord(b[6])
    42
    >>> bytes(bytearray((0, 69, 24, 16, 20, 125, 42)))
    '\x00E\x18\x10\x14}*'
    >>> bytes(bytearray(range(256)))
    '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15
    \x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;?@ABCDEFGHIJKL
    MNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86
    \x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b
    \x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0
    \xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5
    \xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda
    \xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef
    \xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
    

    The Python documentation describes bytes literals here: https://docs.python.org/3.4/reference/lexical_analysis.html#strings

    As to what the binary means, I presume that you know that.