Search code examples
rustrust-cargo

How to build and publish a crate containing generated code


Summary

I have a git repo containing protobuf definitions. I want to create a Python package (this part is working) and a Rust crate (this is not working) containing the code generated from these protobuf definitions.

Problem

The problem is that Cargo doesn't seem to like that I placed the proto directory outside of the Cargo.toml tree. I don't want to move the protos inside of the Cargo.toml tree, because the protos are supposed to be shared with the python tree and this would get confusing.

This is why in the build script, I am checking for two relative parent directories for the proto files: one when running cargo check (where the code remains where it is now) and one when running cargo package (where the code gets buried somewhere inside target/).

This two-directory-check workaround works locally, but fails when publishing the crate, because the generated code isn't published. When I try to use the crate, the build script gets run again, which is now missing the proto definitions. But I don't want to ship the protos, just the hand-written and the generated source code.

Question

What is the best way to package and publish the generated code, without shipping the protos with them? I would prefer not to check in the generated code into git.

Details

The directory structure looks like this:

src
├── proto
│   └── helloworld.proto
├── python
│   ├── my-msgs
│   │   ├── generated
│   │   │   ├── <generated code>
│   ├── requirements.txt
│   └── setup.py
└── rust
    └── my-msgs
        ├── build.rs
        ├── Cargo.lock
        ├── Cargo.toml
        └── src
            ├── lib.rs
            └── test
                ├── helloworld.rs
                └── mod.rs

I am using protobuf-codegen and a build script as recommended in the documentation.

build.rs:

use std::fs;
use std::path::{Path, PathBuf};

fn find_proto_files(input_dir: &Path) -> impl Iterator<Item = PathBuf> {
    // given a directory, return the proto files in it
    let dir_read_result = fs::read_dir(input_dir);
    match dir_read_result {
        Ok(it) => it
            .map(|it| it.expect("Unable to read directory entry").path())
            .filter(|it| match it.extension() {
                Some(e) => e == "proto",
                None => false,
            })
            .map(|it| {
                Path::new(&it).to_owned()
            }),
        Err(err) => panic!(
            "Unable to read directory contents of {}: {:?}",
            input_dir.display(),
            err
        ),
    }
}

fn determine_input_dir() -> Option<PathBuf> {
    let candidates = ["../../proto", "../../../../../proto"];
    match candidates.iter().map(Path::new).find(|it| it.is_dir()) {
        Some(it) => Some(it.to_owned()),
        None => {
            eprintln!(
                "Unable to find directory containing proto files. Checked for {:?}",
                candidates
            );
            None
        }
    }
}

fn main() {
    if let Some(input_dir) = determine_input_dir() {
        println!("Reading proto files from {}", input_dir.display());
        protobuf_codegen::Codegen::new()
            .protoc()
            .includes(&[
                input_dir.to_owned(),
                Path::new("/usr/local/include/google/protobuf/").to_owned(),
            ])
            .inputs(find_proto_files(&input_dir))
            .cargo_out_dir("generated")
            .run_from_script();
    }
}

Solution

  • I eventually came up with my own solution using workspaces.

    1. Setup a workspace with two sub-packages
    2. One as a library, this will contain the generated code
    3. One as a binary, this will host the code generator

    In the binary one, put the code from build.rs into main.rs and make it usable from the command line.

    In the library one, package the generated code.

    Directory layout:

    ├── proto
    │   └── helloworld.proto
    ├── python
    │   ├── <snip>
    └── rust
        ├── Cargo.toml
        ├── generator
        │   ├── Cargo.toml
        │   └── src
        │       └── main.rs
        ├── mymsgs
        │   ├── Cargo.toml
        │   └── src
        │       ├── generated
        │       │   ├── helloworld.rs
        │       │   └── mod.rs
        │       ├── lib.rs
        │       └── test
        │           ├── helloworld.rs
        │           └── mod.rs
        └── src
            └── lib.rs
    
    

    The generator looks something like this:

    use std::fs;
    use std::io::Error;
    
    use clap::Parser;
    use std::path::{Path, PathBuf};
    
    #[derive(Parser, Debug)]
    #[command(name = "generator")]
    #[command(long_about = None)]
    pub struct Args {
        /// Directory containing the protobuf files
        #[arg(short, long, value_parser = dir_parser, value_name = "DIR")]
        protos: PathBuf,
    
        /// Directory to write the generated code to
        #[arg(short, long, value_parser = dir_parser, value_name = "DIR")]
        output: PathBuf,
    }
    
    /// Parses and validates str arguments as directories for use with command line argument parser
    ///
    /// # Arguments
    ///
    /// * `arg_value`: argument value given on command line
    ///
    /// returns: Result<PathBuf, String>
    ///
    fn dir_parser(arg_value: &str) -> Result<PathBuf, String> {
        let path = PathBuf::from(arg_value);
    
        match path.try_exists() {
            Ok(true) => match path.is_dir() {
                true => Ok(path),
                false => Err("Path is not a directory".to_string()),
            },
            Ok(false) => Err("Path does not exist".to_string()),
            Err(err) => Err(format!("Unable to access path: {}", err)),
        }
    }
    
    /// Given an directory path, return all "*.proto" files in it
    ///
    /// # Arguments
    ///
    /// * `input_dir`: Path to read files from
    ///
    /// returns: Result<impl Iterator<Item=PathBuf>+Sized, Error>
    ///
    fn find_proto_files(input_dir: &Path) -> Result<impl Iterator<Item = PathBuf>, Error> {
        fs::read_dir(input_dir).map(|entries| {
            entries
                .map(|entry_r| entry_r.expect("Unable to read directory entry").path())
                .filter(|pathbuf| pathbuf.extension().map_or(false, |e| e == "proto"))
                .map(|pathbuf| Path::new(&pathbuf).to_owned())
        })
    }
    
    fn main() {
        let args = Args::parse();
        protobuf_codegen::Codegen::new()
            // Use `protoc` parser
            .protoc()
            // .protoc_path(&Path::new("/usr/local/bin/protoc"))
            // All inputs and imports from the inputs must reside in `includes` directories.
            .includes(&[
                args.protos.to_owned(),
                Path::new("/usr/local/include/google/protobuf/").to_owned(),
            ])
            // Inputs must reside in some of include paths.
            // .input("src/protos/apple.proto")
            .inputs(find_proto_files(&args.protos).expect("Unable to find proto files"))
            // Specify output directory
            .out_dir(args.output)
            .run_from_script();
    }
    
    

    To generate and publish:

    1. cargo run -p generator -- -p ../proto -o ./mymsgs/src/generated/
    2. cargo publish -p mymsgs --allow-dirty

    This isn't exactly the cleanest way to do it, because cargo won't publish without --allow-dirty since the contents of generated aren't in git, but I guess it's workable.

    I'm open to suggestions - I'm sure this isn't the best possible solution, but it's the one I could come up with.