Search code examples
javaformatpack

Pack200 / Network Transfer Format Spec format specification for SourceDebugExtension attribute


If you try to pack spring-context 5.0.1.RELEASE JAR with pack200, the packer complains that it does not know the class attribute SourceDebugExtension that is used in a couple of classes in there that were compiled from Kotlin classes.


JSR-045 defines this attribute as

The SourceDebugExtension attribute is an optional attribute in the attributes table of the ClassFile structure. There can be no more than one SourceDebugExtension attribute in the attributes table of a given ClassFile structure.

The SourceDebugExtension attribute has the following format:


    SourceDebugExtension_attribute {
       u2 attribute_name_index;
       u4 attribute_length;
       u1 debug_extension[attribute_length];
    }

The items of the SourceDebugExtension_attribute structure are as follows:

attribute_name_index
    The value of the attribute_name_index item must be a valid index into the constant_pool table. The constant_pool entry at that index must be a CONSTANT_Utf8_info structure representing the string "SourceDebugExtension".

attribute_length
    The value of the attribute_length item indicates the length of the attribute, excluding the initial six bytes. The value of the attribute_length item is thus the number of bytes in the debug_extension[] item.

debug_extension[]
    The debug_extension array holds a string, which must be in UTF-8 format. There is no terminating zero byte.

    The string in the debug_extension item will be interpreted as extended debugging information. The content of this string has no semantic effect on the Java Virtual Machine.

The Network Transfer Format Spec defines how to define the format of such attributes, so that pack200 can handle them.


Pack200 allows to skip files with those attributes or throw those attributes out or define their format according to the Network Transfer Format Spec. Unfortunatley I didn't get the format specifier correct for the attribute to get parsed correctly. An example Hexdump of the actual data that needs to be matched by the format specifier - which is the value of debug_extension[] is

00000b90:                   53 4d  41 50 0a 42 65 61 6e 44 ;      SMAP.BeanD
00000ba0: 65 66 69 6e 69 74 69 6f  6e 44 73 6c 2e 6b 74 0a ;efinitionDsl.kt.
00000bb0: 4b 6f 74 6c 69 6e 0a 2a  53 20 4b 6f 74 6c 69 6e ;Kotlin.*S Kotlin
00000bc0: 0a 2a 46 0a 2b 20 31 20  42 65 61 6e 44 65 66 69 ;.*F.+ 1 BeanDefi
00000bd0: 6e 69 74 69 6f 6e 44 73  6c 2e 6b 74 0a 6f 72 67 ;nitionDsl.kt.org
00000be0: 2f 73 70 72 69 6e 67 66  72 61 6d 65 77 6f 72 6b ;/springframework
00000bf0: 2f 63 6f 6e 74 65 78 74  2f 73 75 70 70 6f 72 74 ;/context/support
00000c00: 2f 42 65 61 6e 44 65 66  69 6e 69 74 69 6f 6e 44 ;/BeanDefinitionD
00000c10: 73 6c 24 62 65 61 6e 24  31 24 63 75 73 74 6f 6d ;sl$bean$1$custom
00000c20: 69 7a 65 72 24 31 0a 2a  4c 0a 31 23 31 2c 32 37 ;izer$1.*L.1#1,27
00000c30: 33 3a 31 0a 2a 45 0a                             ;3:1.*E.

Unfortunately I was not able to find the correct format yet. I hope someone here either did this already or has more luck in finding the right format.


Solution

  • Finally I found a working solution myself.

    The format is a bit tricky, as the SourceDebugExtension attribute is defined as a direct UTF-8 string without any terminating character like \0 and in the format string you cannot define something like "take all remaining bytes" or "do until the end of the byte array is reached".

    But after reading up a bit on the possibilities in the format string and on the format of the content of the SourceDebugExtension attribute, I came up with a format that should work in most cases.

    The SourceDebugExtension attribute carries a resolved SMAP. Resolved is important in this case, because in an unresolved SMAP there can be embedded SMAPs that would already contain an end section and this would make it a bit more complex, but not impossible. In a resolved SMAP you always have in the end <line terminator>*E<line terminator> where <line terminator> could be the usual suspects \r, \n or \r\n and this sequence is impossible to appear earlier in the SMAP if it is resolved.

    Now we can use the union layout element with a recursive self-call to build the following format string that will match the SMAPs correctly in most cases. The only thing this format string assumes is, that if before the *E the line terminator \r\n is found, it is also expected after it and if only \r or \n is found before, only \r or \n is expected after it. Which one is not important, just not \r\n. If it would happen, the packing would fail, complaining that one byte was not handled. But if we would check for both characters and there is only one left, we would get an ArrayIndexOutOfBoundsException, and I think this is the less likely case, that different line terminators are mixed.

    So here my current approach:

    [TB(10)[TB(42)[TB(69)[TB(13,10)[]()[(0)]]()[(0)]]()[(0)]](13)[TB(10)[TB(42)[TB(69)[TB(13)[TB(10)[]()[(0)]]()[(0)]]()[(0)]]()[(0)]](42)[TB(69)[TB(13,10)[]()[(0)]]()[(0)]]()[(0)]]()[(0)]]
    

    For better understanding here the same format with some spacing and semantic content. Like this it cannot be used directly. It has to be passed through com.sun.java.util.jar.pack.Attribute#normalizeLayoutString which is a public static method in a package-private class, so regularly not accessible. If you use reflection or Groovy which does it for you or copy the method body into your own method, you can use this version in your code of course.

    [
       # covered likely cases:
       # \\n*E\\n
       # \\r\\n*E\\r\\n
       # \\r*E\\r
       #
       # covered unlikely cases:
       # \\n*E\\r
       # \\r*E\\n
       #
       # uncovered unlikely cases:
       # \\n*E\\r\\n
       # \\r*E\\r\\n
       # \\r\\n*E\\r
       # \\r\\n*E\\n
       TB
          (\\\n) [
             # covered cases:
             # \\n*E\\r
             # \\n*E\\n
             TB
                (\\*) [
                   TB
                      (\\E) [
                         TB
                            (\\\r, \\\n) []
                            () [(0)]
                      ]
                      () [(0)]
                ]
                () [(0)]
          ]
          (\\\r) [
             # covered cases:
             # \\r\\n*E\\r\\n
             # \\r*E\\r
             # \\r*E\\n
             TB
                (\\\n) [
                   # covered cases:
                   # \\r\\n*E\\r\\n
                   TB
                      (\\*) [
                         TB
                            (\\E) [
                               TB
                                  (\\\r) [
                                     TB
                                        (\\\n) []
                                        () [(0)]
                                  ]
                                  () [(0)]
                            ]
                            () [(0)]
                      ]
                      () [(0)]
                ]
                (\\*) [
                   # covered cases:
                   # \\r*E\\r
                   # \\r*E\\n
                   TB
                      (\\E) [
                         TB
                            (\\\r, \\\n) []
                            () [(0)]
                      ]
                      () [(0)]
                ]
                () [(0)]
          ]
          () [(0)]
    ]