=============== Numba IR Stages =============== To allow Numba as a general purpose compiler, we provide different entry points. These entry points also allow for further decoupling of the Numba architecture. We propose three new Intermediate Representations (IRs), from high-level to low-level: * The Python AST IR (input to a numba frontend) * Initial Python-like IR * Untyped IR in SSA form * Typed IR in SSA form * Low-level IR in SSA form * Final LLVM IR, the final input for LLVM. This IR is unportable since the sizes of types are fixed. All IRs except the last are portable across machine architectures. Intermediate Representations ============================ Each IR consists of two layers, namely a higher-level Abstract Syntax Tree encoding, specified, verified and generated by our variant of ASDL schemas (NBDL?). We add the symbol ``@`` to signal that the given type is to be treated as a "unique" object, i.e. one that compares by identity as opposed to structural equality. These objects each carry a unique id and are serialized in a table (and they may participate in circular references, i.e. in a graph): Serialization to LLVM IR (or a direct textual serialization) will consist of something like a generated table, e.g.:: >>> mod = schema.parse("foo = @Foo(str name, foo attr)") >>> foo1 = mod.Foo("foo1") >>> foo2 = mod.Foo("foo2", foo1) >>> foo1.attr = foo2 >>> foo1 Foo(name="foo1", attr=Foo("foo2", Foo(name="foo1", attr=...))) >>> build_llvm_ir(foo1) # name, id, attr !0 = metadata !{ metadata !"foo1", i64 0, i64 1 } !1 = metadata !{ metadata !"foo2", i64 1, i64 0 } Attributes may also be hidden using ``\``, which means the attribute is not considered a child for the purposes of visitors or term rewrites:: foo = @Foo(str name, foo \attr) Use of Schemas -------------- We can use our schemas to generate Python AST classes, which can also verify the typing (using ``TypedProperty``). Each node can furthermore implement its own ``visit`` method, allowing for quick visitor dispatch (compile with Cython or pre-compile with Numba). We can generate automatic mapping code to map schema instances to opaguely typed LLVM IR automatically, which is the abstract syntax generated post-order. E.g. ``a + b * c`` becomes:: !0 = metadata !{ metadata !"operator", i8* "Mul" } !1 = metadata !{ metadata !"operator", i8* "Add" } define i8* some_func(i8* %a, i8* %b, i8* %c) { entry: %0 = call i8* @numba.ir.BinOp(%b, metadata !{0}, %c) %1 = call i8* @numba.ir.BinOp(%a, metadata !{1}, %0) ret %1 } The LLVM IR also has no control flow, i.e. an ``if`` statement will generate IR along the following lines:: %0 = if_body %1 = call void Jump(metadata !{0}) %2 = else_body %3 = call void Jump(metadata !{0}) Initial Python-like IR ---------------------- The initial, Python-like, IR is a subset of a Python AST, the syntax exludes: * ``FunctionDef`` and ``ClassDef``, which are normalized to ``Assign`` of the function and subsequent decorator applications and assignments * No list, dict, set or generators comprehensions, which are normalized to ``For(...)`` etc + method calls The initial IR is what numba decorators produce given a pure Python AST, function or class as input. Sample schema:: module initial { mod = NumbaModule(unit* stats) unit = = lambda | class -- functions -- lambda = Lambda(posinfo pos, funcmeta meta, str name, arguments args, expr body) funcmeta = FunctionMetaData( -- locals={'foo': double} str* names, -- 'foo' nbtype* types, -- double bool nopython, ) -- classes -- class = ClassExpr(posinfo pos, bool is_jit, attrtable table, method* methods) attrtable = AttributeTable(str* attrnames, nbtype* attrtypes) method = Method(posinfo pos, methodsignature signature, stat* body) -- Types -- type = nbtype nbtype = char | short | int_ | long_ | longlong | uchar | ushort | uint | ulong | ulonglong | ... | functype | methodtype methodtype = MethodSignature(functype signature, bool is_staticmethod, bool is_classmethod, bool is_jit, -- whether this is a jit or -- autojit method ) } .. NOTE:: Numba would construct this before starting any pipeline stage. Untyped IR in SSA form ---------------------- Untyped IR in SSA form would be constructed internally by numba during and after the CFA pass and before type inference. This adds to the ``initial`` schema control flow information such as:: * SSA * Stack variable stack allocation (non-ssa variables) * Jumps * Def-use and use-def chains Furthermore: * ``ast.Name`` is rewritten to ``NameTarget``, ``NameReference`` or ``NameParam`` * ``If``, ``While`` and ``For`` lose the ``else`` clause * ``Break`` and ``Continue`` are rewritten to ``Jump`` * Every block but the exit block must be terminated with a ``Jump`` to a new block :: module untyped { cfg = CFG(block* blocks, -- List of blocks in pre-order block entry, -- Entry block of CFG ) block = @ControlBlock(phi* phis, block* \parents, block* \children) phi = Phi(use* \incoming) def = NameTarget(posinfo pos, str id, use* \uses) | phi use = NameReference(posinfo pos, str id, nbtype type, def \def) | PhiRef(phi \def) lambda = Lambda(posinfo pos, funcmeta meta, str name, arguments args, expr body, cfg cfg) stmt = For(block prev_block, block body_block, block exit_block, stmt* body) | ... jump = Jump(block \dst_block) } Typed IR in SSA form -------------------- The typed IR is similar to the untyped IR, except that every (sub-)expression is annotated with a type. Furthermore, the CFG is augmented with outgoing ``Promotion`` terms, which promote a variable for a merge in a subsequent CFG block. E.g.:: # y_0 if x > 10: # block_if y = 2 # y_1 else: # block_else y = 3.0 # y_2 In the example above, ``block_if`` will contain a ``Promotion`` with a use of ``y_1``, replacing all uses of ``y_1`` with the promotion value (which can only ever be a single phi node). All types adhere themselves to a schema, e.g.:: type = Array(type dtype, int ndim) | Pointer(type base_type, int? size) | ... Since the schema specifies the interfaces of the different nodes, users can supply their own node implementation (something we can do with the type system). Hence user-written classes can be automatically instantiated instead of generated ones. The code generator can still emit code for serialization. Low-level Portable IR ===================== The low-level portable IR is a low-level, platform agnostic, IR that: * Has erased all control flow structures such as ``if``, ``while`` and ``for`` * Contains a low-level control flow graph embedded in the AST * Contains all branches down to the final LLVM IR level * i.e. the LLVM code generator cannot add basic blocks or branches * This means all runtime error handling has to be resolved by this IR * The IR contains only low-level, native types such as ``int_``, ``long_``, pointers, structs, etc. The notion of high-level concepts such as arrays or objects is gone.