When Ferrous Metals Corrode, pt. XX

Intro

For this post I'm summarizing chapter 21, "Macros" of the book.

Macros are the more dynamic part of Rust, aka metaprogramming; e.g. defining traits, generating boilerplate and such.

We saw the assert_eq! already. One of the things macros do that can't be done with functions is refer to the current file and line number – the assert macro uses this to print an informative error message.

Macros are expanded into regular Rust code, then regular compilation commences. The assert macro call assert_eq!(gcd(6, 10), 2); is expanded to:

match (&gcd(6, 10), &2) {
    (left_val, right_val) => {
        if !(*left_val == *right_val) {
            panic!("assertion failed: `(left == right)`, \
                    (left: `{:?}`, right: `{:?}`)", left_val, right_val);
        }
    }
}

Expansion works recursively, i.e. the panic! macro call is expanded again. Unlike C, Rust macros never insert unmatched brackets or parens.

Macro Basics

Basics of Macro Expansion

There's several mechanism to define macros. One is to use the macro_rules! macro.

These work entirely by pattern matching. For instance, with comments:

macro_rules! assert_eq {
    ($left:expr, $right:expr) => ({    // look for two expr sep. by comma, assign names left/right
        match (&$left, &$right) {      // the matched vals are inserted here
            (left_val, right_val) => { // matching commences
                if !(*left_val == *right_val) {
                    panic!("assertion failed: `(left == right)`, \
                            (left: `{:?}`, right: `{:?}`)", left_val, right_val);

                }
            }
        }
    })

}
  1. In the matching above "left:expr" is a fragment, of type expr

  2. The fragments are output – without type e.g. expr – with dollar signs

  3. The comma is matched verbatim

You can either parens, brackets or braces with patterns or templates; by convention when calling a macro it's parens, with vec![] use brackets, and braces for macro def.

Unintended Consequences

Macros can obscure a bit what goes on under the hood, in the generated Rust code. One thing to watch out for is e.g. input fragments that mutate state – if they're being called several times in the generated code that might not be immediately obvious and lead to surprising behaviour. This is the reason why in the macro above the left_val and right_val variables are created. Calling the input fragments repeatedly would mutate their state more than what might have been intended. So those values are computed only once and stored away for repeated use.

Similarly that's why we borrow references in the match expression – otherwise we'd move (non-Copy) values out of the expressions which in the context of a macro might be surprising.

Repetition

Example: the vec! macro:

// Repeat a value N times
let buffer = vec![0_u8; 1000];

// A list of values, separated by commas
let numbers = vec!["udon", "ramen", "soba"];

Commented example:

macro_rules! vec {
    // 1. with a semicolon -> ex. vec![0_u8; 1000]
    ($elem:expr ; $n:expr) => {
        ::std::vec::from_elem($elem, $n)
    };
    // 2. comma-sep -> ex vec![1, 2, 3]
    ( $( $x:expr ),* ) => {
        <[_]>::into_vec(Box::new([ $( $x ),* ]))
    };
    // 3. as above but with an extra comma at the end
    ( $( $x:expr ),+ ,) => {
        vec![ $( $x ),* ]
    };
}
  1. The first case matches on semicolon, this is for the form vec![value; size]

  2. Secondly, the form with X values we'd like to put on the list. Similarly to regexes, the $(x),* denotes repetition of the x expr, 0 or more times. This works for both the input and the output fragments

  3. Thirdly, the $(),+ again similar to regexes is for repeating 1 or more times. The third rule just strips of a trailing comma and then recurses into the 2nd rule

Repetition expressions come in regex-like flavors, with *+? modifiers for zero or more, at least one, and 0 or 1 repetitions. These can be combined with $(x) (without separator), $(x), (comma sep), or $(x); (semicolon separator).

The <[_]> is meant to denote a "slice of something", where that something is left for Rust to infer.

Built-In Macros

file!(), line!(), column!()

these expand to the current position in the source of the first macro expansion (in case the macro gets expanded recursively).

stringify!(...tokens...)

put tokens into a string literal: stringify!(1 + 1) -> "1 + 1"

concat!(str0, str1, ...)

concat args into string literal

cfg!(cond)

true if the build env matches the condition

env!("VAR_NAME"), option_env!("VAR_NAME")

expands to the env var at compile time. First form errors out if the env var is not set, the second returns an option

include!("file.rs")

include a rust file

include_str!("file.txt"), include_bytes!("file.dat")

include a static string or bytes from file

todo!(), unimplemented!()

these always panic and are meant to stub out not-yet-implemented arms of if clauses etc.

matches!(value, pattern)

return true if value matches pattern, much like

match value {
  pattern => true,
  _ => false
}

Debugging Macros

Unstable helpers to debug macros.

  • Compile with expanded output. Use rustc ... -Z unstable-options --pretty expanded flags for this. Calling cargo build --verbose will show how cargo would invoke rustc

  • Printf style debugging with the log_syntax!() macro

  • Print macro calls to the terminal by invoking trace_macros!(true)

Building the json! Macro

Example macro implementation. Goal is to have a macro that can parse this:

let students = json!([
    {
        "name": "Jim Blandy",
        "class_of": 1926,
        "major": "Tibetan throat singing"
    },
    {
        "name": "Jason Orendorff",
        "class_of": 1702,
        "major": "Knots"
    }
]);

Fragment Types

First, develop rules to match possible input types. Matching nulls, lists and json objects is relatively straightforward:

macro_rules! json {
    (null)    => { Json::Null };
    ([ ... ]) => { Json::Array(...) };
    ({ ... }) => { Json::Object(...) };
}

We're assuming target values os type Json:Null, Json:Array etc. here

Implementing the null case is easy, but what about lists? Just a list of expr fragments like the second case below doesn't work:

macro_rules! json {
    (null) => {
        Json::Null
    };
    ([ $( $element:expr ),* ]) => {
        Json::Array(vec![ $( $element ),* ])
    };
}

This is because the $element:expr would expect a Rust expression, but we need to parse arbitrary json here.

Not every fragment has to be an expr though. An alternative fragment type is a "token tree" which matches anything within a matched pair of parens/brackets/braces.

This works better:

macro_rules! json {
    (null) => {
        Json::Null
    };
    ([ $( $element:tt ),* ]) => {
        Json::Array(...)
    };
    ({ $( $key:tt : $value:tt ),* }) => {
        Json::Object(...)
    };
    ($other:tt) => {
        ... // TODO: Return Number, String, or Boolean
    };
}

Recursion in Macros

We will need to expand the token trees from above recursively, i.e. by again calling the json! macro on them.

([ $( $element:tt ),* ]) => {
    Json::Array(vec![ $( json!($element) ),* ])
};

...

({ $( $key:tt : $value:tt ),* }) => {
    Json::Object(Box::new(vec![
        $( ($key.to_string(), json!($value)) ),*
    ].into_iter().collect()))
};   

Note there's a recursion limit for macros, 64 by default, but this can be increased with #![recursion_limit = "256"] attribute

Using Traits with Macros

To parse the simple types (bool, string, numbers) it's best to use the From trait. E.g.:

impl From<i32> for Json {
    fn from(i: i32) -> Json {
        Json::Number(i as f64)
    }
}

To cover all numeric types, lets create another macro:

macro_rules! impl_from_num_for_json {
    ( $( $t:ident )* ) => {
        $(
            impl From<$t> for Json {
                fn from(n: $t) -> Json {
                    Json::Number(n as f64)
                }
            }
        )*
    };
}

impl_from_num_for_json!(u8 i8 u16 i16 u32 i32 u64 i64 u128 i128
                        usize isize f32 f64);

This uses the ident fragment type – can be any valid identifier. The macro takes a list of input identifiers (types) and for each of them implements a From trait.

The json macro looks like this now:

macro_rules! json {
    (null) => {
        Json::Null
    };
    ([ $( $element:tt ),* ]) => {
        Json::Array(vec![ $( json!($element) ),* ])
    };
    ({ $( $key:tt : $value:tt ),* }) => {
        Json::Object(Box::new(vec![
            $( ($key.to_string(), json!($value)) ),*
        ].into_iter().collect()))
    };
    ( $other:tt ) => {
        Json::from($other)  // Handle Boolean/number/string
    };
}

With the last arm covered by our From trait implementations.

Even this works:

let width = 4.0;
let desc =
    json!({
        "width": width,
        "height": (width * 9.0 / 4.0)
    });

The paren is a single token tree and gets passed through the object rule with $value:tt ok

Scoping and Hygiene

Consider this alternate implementation of the rule for parsing JSON objects. Instead of Json::Object(Box::new(vec![ $( ($key.to_string(), json!($value)) ),* ].into_iter().collect())), where we use collect to populate the hashmap, lets create a var fields and repeatedly call insert on it:

({ $($key:tt : $value:tt),* }) => {
    {
        let mut fields = Box::new(HashMap::new());
        $( fields.insert($key.to_string(), json!($value)); )*
        Json::Object(fields)
    }
};

But, what if we already had a var fields in the code that calls the macro? Turns out, no we would not clobber it are aggravate the compiler. Rust has a feature called hygienic macros; behind the scenes it'll actually rename the fields var in the macro, effectively keeping the scope of the macro and its caller separate.

This of course also can get in the way:

// suppose we had to run this line often, can we make a macro out of it?
let req = ServerRequest::new(server_socket.session());


// Well nope, this is not going to work...
macro_rules! setup_req {
    () => {
        // ...server_socket and req vars are not imported from the calling scope
        let req = ServerRequest::new(server_socket.session());
    }
}
fn handle_http_request(server_socket: &ServerSocket) {
    setup_req!();  // declares `req`, uses `server_socket`
    ... // code that uses `req`
}

The solution is simply to pass in any vars that we want to use:

macro_rules! setup_req {
    ($req:ident, $server_socket:ident) => {
        let $req = ServerRequest::new($server_socket.session());
    }
}

fn handle_http_request(server_socket: &ServerSocket) {
    setup_req!(req, server_socket);
    ... // code that uses `req`
}

Hygiene although wordier makes more sense in the end and helps prevent weird bugs that could result from clobbering scopes.

Importing and Exporting Macros

There's an attribute #[macro_use] which allows a module to export macros upwards to its parent modules.

With #[macro_export] we can make macros pub automatically.

With pub macros one can't rely on anything being in scope and instead should use absolute paths to any names. The macro_rules! macro provides the $crate fragment which acts like an absolute path to the root module of the crate where the macro was defined. E.g. we can write $crate::Json which works even if we don't have Json imported.

Avoiding Syntax Errors During Matching

When evaluating patterns macro_rules!() will work top-to-bottom. This means you should put more specific patterns at the top, otherwise a broader rule might catch it even if that's not intended – and even if that match then results in a syntax error.

Beyond macro_rules!

Resource for more on macro writing: The Little Book of Rust Macros

A separate mechanism macro mechanism are procedural macros. They are called via attributes (currently only derive is stable, but more is available in nightlies). They're implemented in Rust, unlike the macro_rules!() pattern matching, and operate directly

The book doesn't venture much into procedural macros. There's on line docs for those, also a helpful tutorial is available here.

Coda

Macros seem like an awesome way to generate boilerplate code, add zero-cost abstractions and generally add more dynamism to Rust. Not without cost of course – more DSLs and complexity all around, and it's not like Rust is short of complexity to begin with. Debugging generated code usually also isn't the most pleasant thing to do.