Search code examples
rustmacros

How are curly braces being escaped within the quote! macro?


I am currently attempting to write a derive macro for a custom trait. That's what I got so far:

use proc_macro2::TokenStream;
use quote::{quote, quote_spanned};
use syn::spanned::Spanned;
use syn::{
    parse_macro_input, parse_quote, Data, DeriveInput, Fields, GenericParam, Generics, Index,
};

#[proc_macro_derive(HeapSize)]
pub fn derive_heap_size(input: proc_macro::TokenStream) -> proc_macro::TokenStream {
    // Parse the input tokens into a syntax tree.
    let input = parse_macro_input!(input as DeriveInput);

    // Used in the quasi-quotation below as `#name`.
    let name = input.ident;

    // Add a bound `T: HeapSize` to every type parameter T.
    let generics = add_trait_bounds(input.generics);
    let (impl_generics, ty_generics, where_clause) = generics.split_for_impl();

    // Generate an expression to sum up the heap size of each field.
    let sum = heap_size_sum(&input.data);

    let expanded = quote! {
        // The generated impl.
        impl #impl_generics lestream::FromLeBytes for #name #ty_generics #where_clause {
            fn heap_size_of_children(&self) -> usize {
                #sum
            }
        }
    };

    // Hand the output tokens back to the compiler.
    proc_macro::TokenStream::from(expanded)
}

// Add a bound `T: HeapSize` to every type parameter T.
fn add_trait_bounds(mut generics: Generics) -> Generics {
    for param in &mut generics.params {
        if let GenericParam::Type(ref mut type_param) = *param {
            type_param.bounds.push(parse_quote!(lestream::FromLeBytes));
        }
    }
    generics
}

// Generate an expression to sum up the heap size of each field.
fn heap_size_sum(data: &Data) -> TokenStream {
    match *data {
        Data::Struct(ref data) => {
            match data.fields {
                Fields::Named(ref fields) => {
                    // Expands to an expression like
                    //
                    //     0 + self.x.heap_size() + self.y.heap_size() + self.z.heap_size()
                    //
                    // but using fully qualified function call syntax.
                    //
                    // We take some care to use the span of each `syn::Field` as
                    // the span of the corresponding `heap_size_of_children`
                    // call. This way if one of the field types does not
                    // implement `HeapSize` then the compiler's error message
                    // underlines which field it is. An example is shown in the
                    // readme of the parent directory.
                    let q = quote! {
                        Self {
                    };

                    for field in fields.named {
                        let item_name = field.ident.expect("macro only works with named fields");
                        let item_type = field.ty;

                        quote! {
                            let #item_name = #item_type::from_le_bytes()
                        }
                    }
                }
                _ => panic!("The FromLeBytes derive can only be applied to structs"),
            }
        }
        Data::Enum(_) | Data::Union(_) => unimplemented!(),
    }
}

The idea is to derive the trait for a struct by implementing it in such a way that the trait method from_le_bytes() is called in order for each member of the struct:

use std::fmt::{Display, Formatter};

#[derive(Debug)]
pub enum Error {
    UnexpectedEndOfStream,
}

impl Display for Error {
    fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
        match self {
            Self::UnexpectedEndOfStream => write!(f, "unexpected end of stream"),
        }
    }
}

impl std::error::Error for Error {}

pub trait FromLeBytes: Sized {
    fn from_le_bytes<T>(bytes: &mut T) -> Result<Self, Error>
    where
        T: Iterator<Item = u8>;
}

I.e.:

#[derive(FromLeBytes)]
struct Foo {
    bar: u8;
    spamm: u16;
}

should result in an implementation like

impl FromLeBytes for Foo {
    fn from_le_bytes<T>(bytes: &mut T) -> Result<Self, Error>
    where
        T: Iterator<Item = u8>;
{
    Ok(Self { bar: u8::from_le_bytes(bytes)?, spamm: u16::from_le_bytes(bytes)? })
}

I, however, cannot figure out, how to escape the curly braces of the struct constructor within the quote! macro. This is my first time writing a macro, so I'm also open to other suggestions, if quote! is not the right tool here.


Solution

  • A way to do this was already described in the self-answer, but I'll try to add a bit more background to the problem in question.

    The reason for this error is that output of quote must be a sequence of valid Rust tokens - or, more precisely, a sequence of TokenTrees. And Rust doesn't have a token for the single opening or closing brace; instead of that, it has a concept of group, that is, a subsequence of tokens placed inside the matching pair of braces (or other Delimiters).

    As consequence of this, it's invalid to have an unmatched delimiter anywhere in the TokenStream. And that's exactly what you were trying to do with quote!{ Self { }.


    As for why this necessarily have to be this way - let's think of the following code:

    fn foo() -> proc_macro2::TokenStream  {
        quote!{ { }; // (1)
        // imagine here's some code generating `TokenStream`,
        // so that the function would be valid if this `quote` is valid
    }
    
    fn bar() -> proc_macro2::TokenStream  {
        quote!{ { }; // (2)
        // imagine here's the same code as above in `foo`
        }
    }
    

    And let's ask ourselves: how exactly should the parser be traversing this code in each case?

    Note that the function bar here actually compiles - it doesn't do anything useful, of course, but it's correct; as-is, the quote macro in it generates the TokenStream, containing a single empty block and a semicolon (comment is stripped away). In other words, if the comment was replaced by some code, this code will be passed to the quote and not parsed by the Rust compiler - only lexed. It might very well be nonsensical from the parser's point of view, but since it's quote who receives these tokens - this "nonsensical" doesn't actually matter.
    In other words, with bar parser will see the opening brace for the macro and then consume everything until the matching closing brace as-is.

    Imagine, now, that we want foo to compile too and quote to yield a TokenStream of single opening brace. This means that the parser must treat the closing brace at the line (1) as closing the quote macro and actually run itself over the rest of the tokens, since they are now not in the macro context and therefore must be parsed.

    But now, note that it's actually impossible to differentiate between these two cases when parsing lines (1) and (2): foo and bar are exact same sequence of tokens, except the one extra closing brace. And to check whether this extra brace is actually here, parser would have to use infinite lookahead - that is, to scan until the end of file, then, after seeing that the brace is actually unmatched, rewind and start parsing again.

    Furthermore, strictly speaking, it might very well be impossible to know which exact brace must be treated as unmatched. Think this:

    fn foo() {
        quote::quote!{ { }; { };
    }
    

    If Rust allowed unmatched braces in macros, this code would be ambiguous: where, exactly, must quote end? On the first closing brace, so that the pair of braces after it is parsed as a block? Or on the last one, so that the quote itself gets a block as input (before the single brace)? Compiler will not decide in this case - it will, again, error out.


    In short: allowing unmatched braces would open the doors for way too many complexity, for both compiler authors and language users.