Numba recently gained support for compiling generator functions. This document explains some of the implementation choices.
For clarity, we distinguish between generator functions and
generators. A generator function is a function containing one or
several yield
statements. A generator (sometimes also called “generator
iterator”) is the return value of a generator function; it resumes
execution inside its frame each time next()
is called.
A yield point is the place where a yield
statement is called.
A resumption point is the place just after a yield point where execution
is resumed when next()
is called again.
Suppose we have the following simple generator function:
def gen(x, y):
yield x + y
yield x - y
Here is its CPython bytecode, as printed out using dis.dis()
:
7 0 LOAD_FAST 0 (x)
3 LOAD_FAST 1 (y)
6 BINARY_ADD
7 YIELD_VALUE
8 POP_TOP
8 9 LOAD_FAST 0 (x)
12 LOAD_FAST 1 (y)
15 BINARY_SUBTRACT
16 YIELD_VALUE
17 POP_TOP
18 LOAD_CONST 0 (None)
21 RETURN_VALUE
When compiling this function with NUMBA_DUMP_IR
set to 1, the
following information is printed out:
----------------------------------IR DUMP: gen----------------------------------
label 0:
x = arg(0, name=x) ['x']
y = arg(1, name=y) ['y']
$0.3 = x + y ['$0.3', 'x', 'y']
$0.4 = yield $0.3 ['$0.3', '$0.4']
del $0.4 []
del $0.3 []
$0.7 = x - y ['$0.7', 'x', 'y']
del y []
del x []
$0.8 = yield $0.7 ['$0.7', '$0.8']
del $0.8 []
del $0.7 []
$const0.9 = const(NoneType, None) ['$const0.9']
$0.10 = cast(value=$const0.9) ['$0.10', '$const0.9']
del $const0.9 []
return $0.10 ['$0.10']
------------------------------GENERATOR INFO: gen-------------------------------
generator state variables: ['$0.3', '$0.7', 'x', 'y']
yield point #1: live variables = ['x', 'y'], weak live variables = ['$0.3']
yield point #2: live variables = [], weak live variables = ['$0.7']
What does it mean? The first part is the Numba IR, as already seen in
Stage 2: Generate the Numba IR. We can see the two yield points (yield $0.3
and yield $0.7
).
The second part shows generator-specific information. To understand it we have to understand what suspending and resuming a generator means.
When suspending a generator, we are not merely returning a value to the
caller (the operand of the yield
statement). We also have to save the
generator’s current state in order to resume execution. In trivial use
cases, perhaps the CPU’s register values or stack slots would be preserved
until the next call to next(). However, any non-trivial case will hopelessly
clobber those values, so we have to save them in a well-defined place.
What are the values we need to save? Well, in the context of the Numba Intermediate Representation, we must save all live variables at each yield point. These live variables are computed thanks to the control flow graph.
Once live variables are saved and the generator is suspended, resuming the generator simply involves the inverse operation: the live variables are restored from the saved generator state.
Note
It is the same analysis which helps insert Numba del
instructions
where appropriate.
Let’s go over the generator info again:
generator state variables: ['$0.3', '$0.7', 'x', 'y']
yield point #1: live variables = ['x', 'y'], weak live variables = ['$0.3']
yield point #2: live variables = [], weak live variables = ['$0.7']
Numba has computed the union of all live variables (denoted as “state variables”). This will help define the layout of the generator structure. Also, for each yield point, we have computed two sets of variables:
yield
statement)Function analysis helps us gather enough information to define the layout of the generator structure, which will store the entire execution state of a generator. Here is a sketch of the generator structure’s layout, in pseudo-code:
struct gen_struct_t {
int32_t resume_index;
struct gen_args_t {
arg_0_t arg0;
arg_1_t arg1;
...
arg_N_t argN;
}
struct gen_state_t {
state_0_t state_var0;
state_1_t state_var1;
...
state_N_t state_varN;
}
}
Let’s describe those fields in order.
next()
is called); -1 means
the generator is exhausted and resumption must immediately raise
StopIteration. Other values indicate the yield point’s index starting from 1
(corresponding to the indices shown in the generator info above).x
and y
.Concretely, our example’s generator structure (assuming the generator function is called with floating-point numbers) is then:
struct gen_struct_t {
int32_t resume_index;
struct gen_args_t {
double arg0;
double arg1;
}
struct gen_state_t {
double $0.3;
double $0.7;
double x;
double y;
}
}
Note that here, saving x
and y
is redundant: Numba isn’t able to
recognize that the state variables x
and y
have the same value
as arg0
and arg1
.
How does Numba ensure the generator structure is preserved long enough? There are two cases:
tp_iternext
slot is a wrapper around the
generator’s native code.When compiling a generator function, three native functions are actually generated by Numba:
The next() function is the least straight-forward of the three native functions. It starts with a trampoline which dispatches execution to the right resume point depending on the resume index stored in the generator structure. Here is how the function start may look like in our example:
define i32 @"__main__.gen.next"(
double* nocapture %retptr,
{ i8*, i32 }** nocapture readnone %excinfo,
i8* nocapture readnone %env,
{ i32, { double, double }, { double, double, double, double } }* nocapture %arg.gen)
{
entry:
%gen.resume_index = getelementptr { i32, { double, double }, { double, double, double, double } }* %arg.gen, i64 0, i32 0
%.47 = load i32* %gen.resume_index, align 4
switch i32 %.47, label %stop_iteration [
i32 0, label %B0
i32 1, label %generator_resume1
i32 2, label %generator_resume2
]
; rest of the function snipped
(uninteresting stuff trimmed from the LLVM IR to make it more readable)
We recognize the pointer to the generator structure in %arg.gen
.
The trampoline switch has three targets (one for each resume index 0, 1
and 2), and a fallback target label named stop_iteration
. Label B0
represents the function’s start, generator_resume1
(resp.
generator_resume2
) is the resumption point after the first
(resp. second) yield point.
After generation by LLVM, the whole native assembly code for this function may look like this (on x86-64):
.globl __main__.gen.next
.align 16, 0x90
__main__.gen.next:
movl (%rcx), %eax
cmpl $2, %eax
je .LBB1_5
cmpl $1, %eax
jne .LBB1_2
movsd 40(%rcx), %xmm0
subsd 48(%rcx), %xmm0
movl $2, (%rcx)
movsd %xmm0, (%rdi)
xorl %eax, %eax
retq
.LBB1_5:
movl $-1, (%rcx)
jmp .LBB1_6
.LBB1_2:
testl %eax, %eax
jne .LBB1_6
movsd 8(%rcx), %xmm0
movsd 16(%rcx), %xmm1
movaps %xmm0, %xmm2
addsd %xmm1, %xmm2
movsd %xmm1, 48(%rcx)
movsd %xmm0, 40(%rcx)
movl $1, (%rcx)
movsd %xmm2, (%rdi)
xorl %eax, %eax
retq
.LBB1_6:
movl $-3, %eax
retq
Note the function returns 0 to indicate a value is yielded, -3 to indicate
StopIteration. %rcx
points to the start of the generator structure,
where the resume index is stored.