Making your Macros Hygienic

In Writing Your First Macro, we went through how the use basic tools such as quasiquotes and Walkers in order to perform simple AST transforms. In this section, we will go through the shortcomings of doing the naive transforms, and how to use hygiene to make your macros more robust.

Hygienic macros are macros which will not accidentally shadow an identifier, or have the identifiers they introduce shadowed by user code. For example, the Quick Lambdas macro takes this:

func = f[_ + 1]
print(func(1))
# 2

And turns it into a lambda expression. If we did it naively, like we did in the Writing Your First Macro, we may expand it into this:

func = lambda arg0: arg0 + 1
print(func(1))
# 2

However, if we introduce a variable called arg0 in the enclosing scope:

arg0 = 10
func = f[_ + arg0]
print(func(1))
# 2
# should print 11

It does not behave as we may expect; we probably want it to produce 11. this is because the arg0 identifier introduced by the f macro shadows the arg0 in our enclosing scope. These bugs could be hard to find, since renaming variables could make them appear or disappear. Try executing the code in docs/examples/hygiene/hygiene_failures and to see this for your self.

gen_sym

There is a way out of this: if you create a new variable, but use an identifier that has not been used before, you don’t stand the risk of accidentally shadowing something you didn’t intend to. To help with this, MacroPy provides the gen_sym function, which you can acquire by adding an extra parameter named gen_sym to your macro definition:

@macros.expr
def f(tree, gen_sym, **kw):
    ...
    new_name = gen_sym()
    ... use new_name ...

gen_sym is a function which produce a new identifier (as a string) every time it is called. This is guaranteed to produce a identifier that does not appear anywhere in the original source code, or have been produced by an earlier call to gen_sym. You can thus use these identifiers without worrying about shadowing an identifier someone was using; the full code for this is given in docs/examples/hygiene/gen_sym, so check it out and try executing it to see it working

Hygienic Quasiquotes

Let’s look at another use case: the implementation of the various Tracing macros. These macros generally can’t rely solely on AST transforms, but also require runtime support in order to operate. Consider a simple log macro:

# macro_module.py
from macropy.core.macros import Macros
from macropy.core.quotes import macros, q, u, ast_literal

macros = Macros()

@macros.expr
def log(tree, exact_src, **kw):
    new_tree = q[wrap(u[exact_src(tree)], ast_literal[tree])]
    return new_tree

def wrap(txt, x):
    print(txt + " -> " + repr(x))
    return x

This macro aims to perform a conversion like:

log[1 + 2 + 3] -> wrap("1 + 2 + 3", 1 + 2 + 3)

Where the wrap function then prints out both the source code and the repr of the logged expression. This is but a single example of the myriad of things that expanded macros may need at run time.

Naively performing this transform runs into problems:

from macro_module import macros, log


log[1 + 2 + 3]
# NameError: name 'wrap' is not defined

This is because although wrap is available in macro_module.py, it is not available in test.py. Hence the expanded code fails when it tries to reference wrap. There are several ways which this can be accomplished:

Manual Imports

# test.py
from macro_module import macros, log, wrap

log[1 + 2 + 3]
# 1 + 2 + 3 -> 6

You can simply import wrap from macro_module.py into test.py, along with the log macro itself. This way, the expanded code has a wrap function that it can call. Although this works in this example, it is somewhat fragile in the general case, as the programmer could easily accidentally create a variable named wrap, not knowing that it was being used by log (after all, you can’t see it used anywhere in the source code!), causing it to fail:

# test.py
from macro_module import macros, log, wrap

wrap = "chicken salad"

log[1 + 1]
# TypeError: 'str' object is not callable

Alternately, the programmer could simply forget to import it, for the same reason:

# test.py
from macro_module import macros, log

log[1 + 1]
# NameError: name 'wrap' is not defined

which gives a rather confusing error message: wrap is not defined? From the programmer’s perspective, wrap isn’t used at all! These very common pitfalls mean you should probably avoid this approach in favor of the latter two.

hq

# macro_module.py
from macropy.core.macros import Macros
from macropy.core.quotes import macros, ast_literal
from macropy.core.hquotes import macros, hq, u

macros = Macros()

@macros.expr
def log(tree, exact_src, **kw):
    new_tree = hq[wrap(u[exact_src(tree)], ast_literal[tree])]
    return new_tree

def wrap(txt, x):
    print(txt + " -> " + repr(x))
    return x
# test.py
from macro_module import macros, log

wrap = 3 # try to confuse it

log[1 + 2 + 3]
# 1 + 2 + 3 -> 6
# it still works despite trying to confuse it with `wrap`

The important changes in this snippet, as compared to the previous, are:

  • The removal of wrap from the import statement.
  • Replacement of q with hq

hq is the hygienic quasiquote macro. Unlike traditional quasiquotes (q), hq jumps through some hoops in order to ensure that the wrap you are using inside the hq[...] expression really-truly refers to the wrap that is in scope at the macro definition point, not at tbe macro expansion point (as would be the case using the normal q macro). The end-result is that wrap refers to the wrap you want in macro_module.py, and not whatever wrap happened to be defined in test.py. See docs/examples/hygiene/hygienic_quasiquotes to see it working.

In general, hq allows you to refer to anything that is in scope where hq is being used. Apart from module-level global variables and functions, this includes things like locally scoped variables, which will be properly saved so they can be referred to later even when the macro has completed:

# macro_module.py
@macros.block
def expand(tree, gen_sym, **kw):
    v = 5
    with hq as new_tree:
        return v
    return new_tree
# test.py
def run():
    x = 1
    with expand:
        pass

print(run()) # prints 5

In this case, the value of v is captured by the hq, such that even when expand has returned, it can still be used to return 5 to the caller of the run() function.

Breaking Hygiene

By default, all top-level names in the hq[...] expression (this excludes things like the contents of u[] name[] ast_literal[] unquotes) are hygienic, and are bound to the variable of that name at the macro definition point. This means that if you want a name to bind to some variable at the macro expansion point, you can always manually break hygiene by using the name[] or ast_literal[] unquotes. The hq macro also provides an unhygienic[...] unquote just to streamline this common requirement:

@macros.block
def expand(tree, gen_sym, **kw):
    v = 5
    with hq as new_tree:
        # all these do the same thing, and will refer to the variable named
        # 'v' whereever the macro is expanded
        return name["v"]
        return ast_literal[Name(id="v")]
        return unhygienic[v]
    return new_tree

Although all these do the same thing, you should prefer to use unhygienic[...] as it makes the intention clearer than using name[...] or ast_literal[...] with hard-coded strings.

expose_unhygienic

Going back to the log example:

# macro_module.py
from macropy.core.macros import Macros
from macropy.core.quotes import macros, ast_literal
from macropy.core.hquotes import macros, hq, u, unhygienic

macros = Macros()

@macros.expr
def log(tree, exact_src, **kw):
    new_tree = hq[wrap(unhygienic[log_func], u[exact_src(tree)], ast_literal[tree])]
    return new_tree


def wrap(printer, txt, x):
    printer(txt + " -> " + repr(x))
    return x

@macros.expose_unhygienic
def log_func(txt):
    print(txt)

expose_unhygienic is a hybrid between manual importing and hq. Like manual importing, decorating functions with expose_unhygienic causes them to be imported under their un-modified name, meaning they can shadow and be shadowed by other identifiers in the macro-expanded code. Like expose, it does not require the source file using the macros to put the identifier in the import list. This helps match what users of the macro expect: since the name doesn’t ever appear anywhere in the source, it doesn’t make sense for the macro to require the name being imported to work.

In this example, the log macro uses expose_unhygienic on a log_func function. The macro-expanded code by default will capture the log_func function imported from macro_module.py, which prints the log to the console:

# test.py
from macro_module import macros, log

log[1 + 1]
# 1 + 1 -> 2

But a user can intentionally shadow log_func in order to redirect the logging, for example to a list

# test.py
from macro_module import macros, log

buffer = []
def log_func(txt):
    buffer.append(txt)

log[1 + 2 + 3]
log[1 + 2]
# doesn't print anything

print(buffer)
# ['1 + 2 + 3 -> 6', '1 + 2 -> 3']

See docs/examples/hygiene/unhygienic to see this example in action. In general, expose_unhygienic is useful when you want the macro to use a name that can be intentionally shadowed by the programmer using the macro, allowing the programmer to implicitly modify the behavior of the macro via this shadowing.


This section has covered how to use the various tools available (gen_sym, hq, expose_unhygienic) in order to carefully control the scoping and variable binding in the code generated by macros. See the section on Hygiene for a more detailed explanation of what’s going on behind the scenes.