Search code examples
csizelzmaxz

How to get the uncompressed size of an LZMA2 file (.xz / liblzma)


I'm looking for a way to get the uncompressed stream size of an LZMA2 / .xz file compressed with the xz utility.

I'm using liblzma from Windows/Linux for this task, so I guess I'm looking for some C/C++ API in liblzma that will do the trick.


Solution

  • I think I've found a solution.

    This is a very crude code sample, but seems to work fine.

    I'm assuming I have a do_mmap() function that maps the entire file as read-only into memory, and returns the total size mapped. This can naturally be adapted to use read/fread/ReadFile or any other File API.

    extern size_t get_uncompressed_size(const char *filename)
    {
       lzma_stream_flags stream_flags;
       int file_size;
    
       const uint8_t *data = (uint8_t *) do_mmap(filename, &file_size);
    
       // 12 is the size of the footer per the file-spec...
       const uint8_t *footer_ptr = data + file_size - 12;
    
       // Something is terribly wrong
       if (footer_ptr < data) {
         do_unmap((void *)data, file_size);
         return -1;
       }
    
       // Decode the footer, so we have the backward_size pointing to the index
       lzma_stream_footer_decode(&stream_flags, (const uint8_t *)footer_ptr);
       // This is the index pointer, where the size is ultimately stored...
       const uint8_t *index_ptr = footer_ptr - stream_flags.backward_size;
       // Allocate an index
       lzma_index *index = lzma_index_init(NULL);
       uint64_t memlimit;
       size_t in_pos = 0;
       // decode the index we calculated
       lzma_index_buffer_decode(&index, &memlimit, NULL, index_ptr, &in_pos, footer_ptr - index_ptr);
       // Just make sure the whole index was decoded, otherwise, we might be
       // dealing with something utterly corrupt
       if (in_pos != stream_flags.backward_size) {
         do_unmap((void *)data, file_size);
         lzma_index_end(index, NULL);
         return -1;
       }
       // Finally get the size
       lzma_vli uSize = lzma_index_uncompressed_size(index);
       lzma_index_end(index, NULL);
       return (size_t) uSize;
    }