Search code examples
filerustiozstd

How to decode and read a zstd file in Rust?


I'm looking for some advice on how to decode and read a zstd file and I'm feeling a bit lost since it is my first big project since I started learning Rust.

I am using Rust for this project since it is for an internship and the data export/compression tool was written in Rust long ago so I thought I can take some inspiration. I am learning Rust from scratch so I am not very familiar with the structs and functions of the file i/o processes. I have a code snippet which is not working currently so I have some questions:

use std::fs::File;
use std::io::{self, BufReader};
use zstd::stream::read::Decoder;

fn read_lines<P>(filename: P) -> io::Result<io::Lines<io::BufReader<File>>>
where P: AsRef<Path>, { 
    if let Ok(file) = File::open(filename) { 
        if let Ok(buf_reader) = BufReader::new(file) { 
            if let Ok(decoder) = Decoder::new(buf_reader) { 
                return Ok(io::BufReader::new(decoder).lines()); } } } }

if let Ok(lines) = read_lines(filename) {

    for line in lines {
        if let Ok(ip) = line {
            println!("{}", ip)

        }
    }
}

Since it is a compressed file, should I decode it first as a whole and then start reading line by line? I know that the decompressed files are in jsonl format so each line is a separate json file. If the file size is too big to read it in one go, how should I proceed?

Also, if you have another package than zstd you are using that you would recommend, please share it with me. I would appreciate all the help.


Solution

  • You're going about it the right way, using the Decoder in a BufReader will allow you to read lines from the compressed file without requiring the whole file to be loaded up-front. The outer BufReader you use to read the lines will read chunks from the decoder until a newline is reached, and reading from the decoder will decode in chunks from the file.

    You just haven't got the structure and return type correct. Here's what I would do:

    use std::fs::File;
    use std::io::{BufRead, BufReader, Error as IoError, Lines};
    use std::path::Path;
    
    use zstd::stream::read::Decoder;
    
    fn read_lines<P>(filename: P) -> Result<Lines<BufReader<Decoder<'static, BufReader<File>>>>, IoError>
    where
        P: AsRef<Path>,
    {
        let file = File::open(filename)?;
        let decoder = Decoder::new(file)?;
        Ok(BufReader::new(decoder).lines())
    }
    

    To explain a bit more:

    • since File::open and Decoder::new both return std::io::Error if a problem is encountered, we can use ? to return the error early and avoid nested if-lets.
    • Decoder::new takes in a reader type and creates a Decoder<'_, BufReader<_>> (i.e. it creates a BufReader for the File itself) so we don't have to do that part.
    • the return type has all the layers nested within eachother but if you like you can replace it with Result<Lines<impl BufRead>, IoError> in this instance to keep it concise.