Search code examples
juliacompressed-files

julia: how to read a bz2 compressed text file


In R, I can read a whole compressed text file into a character vector as

readLines("file.txt.bz2")

readLines transparently decompresses .gz and .bz2 files but also works with non-compressed files. Is there something analogous available in julia? I can do

text = open(f -> read(f, String), "file.txt")

but this cannot open compressed files. What is the preferred way to read bzip2 files? Is there any approach (besides manually checking the filename extension) that can deduce compression format automatically?


Solution

  • I don't know about anything automatic but this is how you could (create and) read a bz2 compressed file:

    using CodecBzip2 # after ] add CodecBzip2
    
    # Creating a dummy bz2 file
    mystring = "Hello StackOverflow!"
    mystring_compressed = transcode(Bzip2Compressor, mystring)
    write("testfile.bz2", mystring_compressed)
    
    # Reading and uncompressing it
    compressed = read("testfile.bz2")
    plain = transcode(Bzip2Decompressor, compressed)
    String(plain) # "Hello StackOverflow!"
    

    There are also streaming variants available. For more see CodecBzip2.jl.