Search code examples
parsingrustmacrosabstract-syntax-tree

How to parse arbitrary tokens in Rust's procedural macros?


I am currently developing a parser for a DSL in Rust. I am using syn, quote and proc-macro2 to help with that.

What causes problems for me is, that there are certain literal types in that DSL that I cannot parse. One example are single-quoted strings:

My TDD setup includes the following unit test:

#[test]
fn single_quoted_str() {
  let input: proc_macro2::TokenStream = quote!('single quoted');
  let literal = syn::parse2::<MySingleQuotedStringType>();
  assert!(literal.is_ok());
  ...
}

Unfortunately, already the first line is failing with a LexError. I also tried using TokenStream::from_str(...) and syn::parse_str(...) – both resulting in the same issue.

How can I accept and parse completely arbitrary tokens in a macro? Using double quotes instead is not really an option since the DSL already exists. Also, there are other literal types for which the same would apply: For example, there is a date literal which follows the pattern date'2023-02-26'.

Is there any general solution for that? I would only need a string token which is extracted using whitespace splitting. The rest I could implement manually.


Solution

  • In general, you cannot do this. The input of proc macros is a token stream where each token has to be a valid token in the Rust lexicographical grammar. That grammar does not include single quoted string literals (single quote = single character literal).

    Some things can work, like date"2023-02-26" which is just an identifier and then a string literal. But again, you cannot have any tokens that don't exist in Rust.

    If you really must parse the exact DSL you are describing: pass a single string literal to your proc macro that contains the DSL. For example:

    my_macro!("
        'some string'
        date'2023-02-26'
    ");
    

    Then you just have the raw string inside your macro and can do whatever.