A bit different metaprogramming

At Polystream we often work on API surfaces. That surface can be big and require lots of tedious boilerplate code. As an example, D3D11’s ID3D11DeviceContext interface has 108 methods. This is a one time cost per API, so it’s not necessarily the end of the world, but:

  • we have many APIs with many versions and updates
  • we might want to refactor our code

What would we want to do with every function, method, and interface? We could gather statistics, do logging, check input, etc. Lots of that code can be automated with meta-programming using C preprocessor or C++ templates, but it still requires a manual entry for every interface and method.

Other than that:

  • C and C++ macros are simple text preprocessing and we lose the benefits of type checking,
  • C++ template metaprogramming is an art of its own, with byzantine error messages and mind-boggling syntax.

An alternative is to parse the headers or IDL files and work directly on language syntax tree nodes.

Where to begin…

The lexer and parser I used during uni courses (flex and bison) are C based old-school code and I hoped I can find something more modern and fun to work with.

Another thing I tried was Lemon C parser and stb C lexer. My productivity still wasn’t as good as I hoped.

Finally, I stumbled upon Lark parsing library (https://github.com/lark-parser/lark – You don’t have permissions to view try another account). It’s a Python parser implementing Earley algorithm and providing convenient tools to traverse resulting syntax tree. Lark combined with python gives fast iteration times, rapid grammar prototyping and makes the job much more enjoyable.

How to grow a tree

With lark we work in 2 stages: we specify a grammar and we write a tree transformer. We can also iterate on the grammar to inline rules or make the tree more suitable for our needs. This cheat sheet is incredibly helpful (lark_cheatsheet.pdf).

Let’s say we want to parse this code:

interface IInterface : IUnknown
    int CallMe();

Extremely simple lark grammar able to parse it:

start: "interface" CNAME ":" CNAME "{" CNAME CNAME "(" ")" ";" "}"
%import common.CNAME
%import common.WS
%ignore WS

Lark has pre-defined regex helpers for common patterns, such as whitespaces, c-style names, numbers and more.

Applying the grammar on the examples gives the following result:


While we can parse the example, there are severe limitations:

  • support for only one interface, class and method
  • we can’t easily map parsed tokens to their meaning
  • we can’t have method with arguments
  • the tree isn’t much of a tree, as we can’t branch

start: interface*
interface: "interface" name ":" base_interface ("," base_interface)* "{" function_declaration* "}"
name: CNAME
base_interface: CNAME
function_declaration: type name "(" [function_argument ("," function_argument)*] ")" ";"
type: CNAME
function_argument: type name
  • we support many interfaces
  • we support multiple inheritance
  • we support method arguments
  • we can easily distinguish class name from it’s base name

For an example code:

interface IInterface : IUnknown, IKnown
    int CallMe(int x, float y);

We get the following tree:

    name	IInterface
    base_interface	IUnknown
    base_interface	IKnown
      type	int
      name	CallMe
        type	int
        name	x
        type	float
        name	y

It is fairly easy to implement entire C grammar in Lark. C++ is a nastier beast, but a subset of the language can be parsed without being a compiler expert. For more advanced needs using specialized C++ parsers, such as libclang, might be necessary.

How to climb a tree

The result of a successful parsing is a lark syntax tree. Lark provides helper classes to manipulate the tree:

  • a transformer
  • a visitor
  • an interpreter.

The Transformer and visitor are working bottom-up (from the leaves to the root). The Interpreter works top-down (the root to the leaves).

I found the transformer to be the most straightforward way to convert a tree to a (simplified) C++ syntax tree.

With the lark transformer, the leaves can be converted to attributes and nodes can be converted to python objects. Python’s setattr and getattr calls can decorate our node objects (interfaces, functions, etc.) with attributes while traversing from the bottom.

def aggregate_attrs(object, items):
    tuples = [x for x in items if isinstance(x, tuple)]
    for t in tuples:
        if hasattr(object, t[0]) and isinstance(getattr(object, t[0]), list):
            getattr(object, t[0]).extend(t[1])
            setattr(object, t[0], t[1])
    return object
class Interface:
    def pretty(self):
        s = 'class {} : {}\n'.format(self.name, ', '.join(['public ' + x for x in self.base]))
        s += '{\n'
        for f in self.func:
            s += f.pretty()
        s += '};\n'
        return s
class Function:
  def pretty(self):
        s = '\t{} {}('.format(self.type, self.name)
        s += ', '.join([a.pretty() for a in self.arg])
        s += ');\n'
        return s
class Argument:
    def pretty(self):
        s = '{} {}'.format(self.type, self.name)
        return s
class MyTransformer(Transformer):
    def name(self, items):
        return ('name', items[0].value)
    def base_interface(self, items):
        return ('base', [items[0].value])
    def type(self, items):
        return ('type', items[0].value)
    def function_argument(self, items):
        arg = Argument()
        aggregate_attrs(arg, items)
        return ('arg', [arg])
    def function_declaration(self, items):
        func = Function()
        aggregate_attrs(func, items)
        return ('func', [func])
    def interface(self, items):
        interface = Interface()
        interface.type = 'interface'
        aggregate_attrs(interface, items)
        return interface

Printing of the parsed interface inverses the operation:

class IInterface : public IUnknown, public IKnown
	int CallMe(int x, float y);

Parsing a code to AST and converting it back to the original is a good sanity check, but it’s just a base for other use cases.

New worlds

Those simple examples are an introduction to Lark and an approachable way to play with metaprogramming. My favourite aspects of the Lark-python combo are flexibility and rapid iterations with quick feedback.