Search code examples
httpstcp

Extract HTTPS host from first TCP message


I'm trying to extract the host name (as string) from a still encrypted HTTPS.

The host name is not encrypted due to the protocol. But I can't find the correct method to extract it. Deferent domains have deferent length and not all request are similar.

google.com (first TCP message)

�� T}@����5�;�;��O��KG��Y�  ����~fM�FRH�N��7s�6w��[���ک�>�,�0�̨̩̪�+�/��$�(k�#�'g�
�9�   �3��=<5/�u

google.com

3th2http/1.11
*( 
+-3&$ 
cϼB�Y�j¬��b*a$��n$���}�X�.u�

example.com (first TCP message)

�ę`�ۜ����z#�X��I�&���~�� ��Ao�)���쿂�7�-�������`�l>�,�0�̨̩̪�+�/��$�(k�#�'g�
�9�    �3��=<5/�uexample.com

3th2http/1.11
*(   
+-3&$ a�
���a桵.3�*L_��d�N�yK
*r��

Any ideas ?


Solution

  • Well, we are probably talking about the TLS protocol with the SNI extension. So you basically need to have a parser that is able to understand the initial TLS packet. It is not too hard if you just implement parsing the handshake protocol, more specifically the ClientHello message. See https://www.rfc-editor.org/rfc/rfc5246 and https://www.rfc-editor.org/rfc/rfc6066#section-3.