Numba IR Stages

To allow Numba as a general purpose compiler, we provide different entry points. These entry points also allow for further decoupling of the Numba architecture. We propose three new Intermediate Representations (IRs), from high-level to low-level:

  • The Python AST IR (input to a numba frontend)
  • Initial Python-like IR
  • Untyped IR in SSA form
  • Typed IR in SSA form
  • Low-level IR in SSA form
  • Final LLVM IR, the final input for LLVM. This IR is unportable since the sizes of types are fixed.

All IRs except the last are portable across machine architectures.

Intermediate Representations

Each IR consists of two layers, namely a higher-level Abstract Syntax Tree encoding, specified, verified and generated by our variant of ASDL schemas (NBDL?). We add the symbol @ to signal that the given type is to be treated as a “unique” object, i.e. one that compares by identity as opposed to structural equality. These objects each carry a unique id and are serialized in a table (and they may participate in circular references, i.e. in a graph):

Serialization to LLVM IR (or a direct textual serialization) will consist of something like a generated table, e.g.:

>>> mod = schema.parse("foo = @Foo(str name, foo attr)")

>>> foo1 = mod.Foo("foo1")
>>> foo2 = mod.Foo("foo2", foo1)
>>> foo1.attr = foo2

>>> foo1
Foo(name="foo1", attr=Foo("foo2", Foo(name="foo1", attr=...)))

>>> build_llvm_ir(foo1)   # name,  id,    attr
!0 = metadata !{ metadata !"foo1", i64 0, i64 1 }
!1 = metadata !{ metadata !"foo2", i64 1, i64 0 }

Attributes may also be hidden using \, which means the attribute is not considered a child for the purposes of visitors or term rewrites:

foo = @Foo(str name, foo \attr)

Use of Schemas

We can use our schemas to generate Python AST classes, which can also verify the typing (using TypedProperty). Each node can furthermore implement its own visit method, allowing for quick visitor dispatch (compile with Cython or pre-compile with Numba).

We can generate automatic mapping code to map schema instances to opaguely typed LLVM IR automatically, which is the abstract syntax generated post-order. E.g. a + b * c becomes:

!0 = metadata !{ metadata !"operator", i8* "Mul" }
!1 = metadata !{ metadata !"operator", i8* "Add" }

define i8* some_func(i8* %a, i8* %b, i8* %c) {
entry:
  %0 = call i8* @numba.ir.BinOp(%b, metadata !{0}, %c)
  %1 = call i8* @numba.ir.BinOp(%a, metadata !{1}, %0)
  ret %1
}

The LLVM IR also has no control flow, i.e. an if statement will generate IR along the following lines:

%0 = if_body
%1 = call void Jump(metadata !{0})
%2 = else_body
%3 = call void Jump(metadata !{0})

Initial Python-like IR

The initial, Python-like, IR is a subset of a Python AST, the syntax exludes:

  • FunctionDef and ClassDef, which are normalized to Assign of the function and subsequent decorator applications and assignments
  • No list, dict, set or generators comprehensions, which are normalized to For(...) etc + method calls

The initial IR is what numba decorators produce given a pure Python AST, function or class as input.

Sample schema:

module initial {

    mod = NumbaModule(unit* stats)

    unit =
      = lambda
      | class

    -- functions --
    lambda
      = Lambda(posinfo pos, funcmeta meta, str name, arguments args,
               expr body)

    funcmeta
      = FunctionMetaData(
            -- locals={'foo': double}
            str* names,     -- 'foo'
            nbtype* types,  -- double
            bool nopython,
        )

    -- classes --
    class
      = ClassExpr(posinfo pos, bool is_jit, attrtable table, method* methods)

    attrtable
      = AttributeTable(str* attrnames, nbtype* attrtypes)

    method
      = Method(posinfo pos, methodsignature signature, stat* body)

    -- Types --

    type = nbtype
    nbtype
      = char | short | int_ | long_ | longlong
      | uchar | ushort | uint | ulong | ulonglong
      | ...
      | functype
      | methodtype

    methodtype
      = MethodSignature(functype signature,
                        bool is_staticmethod,
                        bool is_classmethod,
                        bool is_jit, -- whether this is a jit or
                                     -- autojit method
                       )
}

Note

Numba would construct this before starting any pipeline stage.

Untyped IR in SSA form

Untyped IR in SSA form would be constructed internally by numba during and after the CFA pass and before type inference. This adds to the initial schema control flow information such as:

* SSA
* Stack variable stack allocation (non-ssa variables)
* Jumps
* Def-use and use-def chains

Furthermore:

  • ast.Name is rewritten to NameTarget, NameReference or NameParam
  • If, While and For lose the else clause
  • Break and Continue are rewritten to Jump
  • Every block but the exit block must be terminated with a Jump to a new block
module untyped {

    cfg
      = CFG(block* blocks, -- List of blocks in pre-order
            block entry,   -- Entry block of CFG
           )

    block
      = @ControlBlock(phi* phis, block* \parents, block* \children)

    phi
      = Phi(use* \incoming)

    def
      = NameTarget(posinfo pos, str id, use* \uses)
      | phi

    use
      = NameReference(posinfo pos, str id, nbtype type, def \def)
      | PhiRef(phi \def)

    lambda
      = Lambda(posinfo pos, funcmeta meta, str name, arguments args,
               expr body, cfg cfg)

    stmt
      = For(block prev_block,
            block body_block,
            block exit_block,
            stmt* body)
      | ...

    jump
      = Jump(block \dst_block)
}

Typed IR in SSA form

The typed IR is similar to the untyped IR, except that every (sub-)expression is annotated with a type. Furthermore, the CFG is augmented with outgoing Promotion terms, which promote a variable for a merge in a subsequent CFG block. E.g.:

# y_0
if x > 10:
    # block_if
    y = 2           # y_1
else:
    # block_else
    y = 3.0         # y_2

In the example above, block_if will contain a Promotion with a use of y_1, replacing all uses of y_1 with the promotion value (which can only ever be a single phi node).

All types adhere themselves to a schema, e.g.:

type
  = Array(type dtype, int ndim)
  | Pointer(type base_type, int? size)
  | ...

Since the schema specifies the interfaces of the different nodes, users can supply their own node implementation (something we can do with the type system). Hence user-written classes can be automatically instantiated instead of generated ones. The code generator can still emit code for serialization.

Low-level Portable IR

The low-level portable IR is a low-level, platform agnostic, IR that:

  • Has erased all control flow structures such as if, while and for

  • Contains a low-level control flow graph embedded in the AST

  • Contains all branches down to the final LLVM IR level
    • i.e. the LLVM code generator cannot add basic blocks or branches
    • This means all runtime error handling has to be resolved by this IR
  • The IR contains only low-level, native types such as int_, long_, pointers, structs, etc. The notion of high-level concepts such as arrays or objects is gone.

Table Of Contents

Previous topic

Numba Architecture

Next topic

Numba Module Reference

This Page