===============
Numba IR Stages
===============

To allow Numba as a general purpose compiler, we provide different entry
points. These entry points also allow for further decoupling of the Numba
architecture. We propose three new Intermediate Representations (IRs),
from high-level to low-level:

    * The Python AST IR (input to a numba frontend)
    * Initial Python-like IR
    * Untyped IR in SSA form
    * Typed IR in SSA form
    * Low-level IR in SSA form
    * Final LLVM IR, the final input for LLVM. This IR is unportable
      since the sizes of types are fixed.

All IRs except the last are portable across machine architectures.

Intermediate Representations
============================

Each IR consists of two layers, namely a higher-level Abstract Syntax Tree
encoding, specified, verified and generated by our variant of ASDL
schemas (NBDL?). We add the symbol ``@`` to signal that the given type
is to be treated as a "unique" object, i.e. one that compares by identity
as opposed to structural equality. These objects each carry a unique id
and are serialized in a table (and they may participate in circular
references, i.e. in a graph):

Serialization to LLVM IR (or a direct textual serialization) will
consist of something like a generated table, e.g.::

    >>> mod = schema.parse("foo = @Foo(str name, foo attr)")

    >>> foo1 = mod.Foo("foo1")
    >>> foo2 = mod.Foo("foo2", foo1)
    >>> foo1.attr = foo2

    >>> foo1
    Foo(name="foo1", attr=Foo("foo2", Foo(name="foo1", attr=...)))

    >>> build_llvm_ir(foo1)   # name,  id,    attr
    !0 = metadata !{ metadata !"foo1", i64 0, i64 1 }
    !1 = metadata !{ metadata !"foo2", i64 1, i64 0 }

Attributes may also be hidden using ``\``, which means the attribute
is not considered a child for the purposes of visitors or term
rewrites::

    foo = @Foo(str name, foo \attr)

Use of Schemas
--------------
We can use our schemas to generate Python AST classes, which can
also verify the typing (using ``TypedProperty``). Each node can
furthermore implement its own ``visit`` method, allowing for quick
visitor dispatch (compile with Cython or pre-compile with Numba).

We can generate automatic mapping code to map schema instances to
opaguely typed LLVM IR automatically, which is the abstract syntax
generated post-order. E.g. ``a + b * c`` becomes::

    !0 = metadata !{ metadata !"operator", i8* "Mul" }
    !1 = metadata !{ metadata !"operator", i8* "Add" }

    define i8* some_func(i8* %a, i8* %b, i8* %c) {
    entry:
      %0 = call i8* @numba.ir.BinOp(%b, metadata !{0}, %c)
      %1 = call i8* @numba.ir.BinOp(%a, metadata !{1}, %0)
      ret %1
    }

The LLVM IR also has no control flow, i.e. an ``if`` statement
will generate IR along the following lines::

    %0 = if_body
    %1 = call void Jump(metadata !{0})
    %2 = else_body
    %3 = call void Jump(metadata !{0})

Initial Python-like IR
----------------------

The initial, Python-like, IR is a subset of a Python AST, the
syntax exludes:

    * ``FunctionDef`` and ``ClassDef``, which are normalized
      to ``Assign`` of the function and subsequent
      decorator applications and assignments
    * No list, dict, set or generators comprehensions, which are
      normalized to ``For(...)`` etc + method calls

The initial IR is what numba decorators produce given a pure
Python AST, function or class as input.


Sample schema::

    module initial {

        mod = NumbaModule(unit* stats)

        unit =
          = lambda
          | class

        -- functions --
        lambda
          = Lambda(posinfo pos, funcmeta meta, str name, arguments args,
                   expr body)

        funcmeta
          = FunctionMetaData(
                -- locals={'foo': double}
                str* names,     -- 'foo'
                nbtype* types,  -- double
                bool nopython,
            )

        -- classes --
        class
          = ClassExpr(posinfo pos, bool is_jit, attrtable table, method* methods)

        attrtable
          = AttributeTable(str* attrnames, nbtype* attrtypes)

        method
          = Method(posinfo pos, methodsignature signature, stat* body)

        -- Types --

        type = nbtype
        nbtype
          = char | short | int_ | long_ | longlong
          | uchar | ushort | uint | ulong | ulonglong
          | ...
          | functype
          | methodtype

        methodtype
          = MethodSignature(functype signature,
                            bool is_staticmethod,
                            bool is_classmethod,
                            bool is_jit, -- whether this is a jit or
                                         -- autojit method
                           )
    }

.. NOTE:: Numba would construct this before starting any pipeline stage.

Untyped IR in SSA form
----------------------

Untyped IR in SSA form would be constructed internally by numba during
and after the CFA pass and before type inference. This adds to the
``initial`` schema control flow information such as::

    * SSA
    * Stack variable stack allocation (non-ssa variables)
    * Jumps
    * Def-use and use-def chains

Furthermore:

    * ``ast.Name`` is rewritten to ``NameTarget``, ``NameReference`` or ``NameParam``
    * ``If``, ``While`` and ``For`` lose the ``else`` clause
    * ``Break`` and ``Continue`` are rewritten to ``Jump``
    * Every block but the exit block must be terminated with a ``Jump`` to a new block

::

    module untyped {

        cfg
          = CFG(block* blocks, -- List of blocks in pre-order
                block entry,   -- Entry block of CFG
               )

        block
          = @ControlBlock(phi* phis, block* \parents, block* \children)

        phi
          = Phi(use* \incoming)

        def
          = NameTarget(posinfo pos, str id, use* \uses)
          | phi

        use
          = NameReference(posinfo pos, str id, nbtype type, def \def)
          | PhiRef(phi \def)

        lambda
          = Lambda(posinfo pos, funcmeta meta, str name, arguments args,
                   expr body, cfg cfg)

        stmt
          = For(block prev_block,
                block body_block,
                block exit_block,
                stmt* body)
          | ...

        jump
          = Jump(block \dst_block)
    }

Typed IR in SSA form
--------------------

The typed IR is similar to the untyped IR, except that every (sub-)expression
is annotated with a type. Furthermore, the CFG is augmented with outgoing
``Promotion`` terms, which promote a variable for a merge in a subsequent
CFG block. E.g.::

    # y_0
    if x > 10:
        # block_if
        y = 2           # y_1
    else:
        # block_else
        y = 3.0         # y_2

In the example above, ``block_if`` will contain a ``Promotion`` with a use
of ``y_1``, replacing all uses of ``y_1`` with the promotion value (which
can only ever be a single phi node).

All types adhere themselves to a schema, e.g.::

    type
      = Array(type dtype, int ndim)
      | Pointer(type base_type, int? size)
      | ...

Since the schema specifies the interfaces of the different nodes, users
can supply their own node implementation (something we can do with the
type system). Hence user-written classes can be automatically
instantiated instead of generated ones. The code generator can still
emit code for serialization.

Low-level Portable IR
=====================

The low-level portable IR is a low-level, platform agnostic, IR that:

    * Has erased all control flow structures such as ``if``, ``while``
      and ``for``
    * Contains a low-level control flow graph embedded in the AST
    * Contains all branches down to the final LLVM IR level
        * i.e. the LLVM code generator cannot add basic blocks
          or branches
        * This means all runtime error handling has to be resolved by
          this IR
    * The IR contains only low-level, native types such as ``int_``,
      ``long_``, pointers, structs, etc. The notion of high-level
      concepts such as arrays or objects is gone.