Demo 2: CUDA Backend for Tanh Approximation in GELU Activation Layer¶

(Depends on Ch.08)

This demo notebook shows how to use a CUDA backend to accelerate the GELU activation function using a Pade44 rational approximation for tanh. We build on the previous demo and show how to offload the computation to the GPU using Numba and a custom backend.

The notebook demonstrates:

  • How to configure and use a GPU backend for vectorized ufuncs
  • How to run and test the optimized GELU function on CUDA
  • How to compare results with the original NumPy implementation
In [1]:
from numba import cuda
In [2]:
from ch08_gpu_offload import GPUBackend
from ch08_gpu_offload import gpu_compiler_config as _ch08_gpu_compiler_config
from demo01_gelu_tanh_approx import *
from utils.report import Report

Setup GPU Backend¶

Define a backend that combines the ufunc backend and GPU backend, enabling compilation and execution of vectorized functions on CUDA devices.

In [3]:
class GpuUfuncBackend(Backend, GPUBackend):
    # Ufunc + GPU backend
    def __init__(self, compile_only: bool = False):
        GPUBackend.__init__(self, compile_only)
In [4]:
gpu_compiler_config = {
    **_ch08_gpu_compiler_config,
    "converter_class": ExtendEGraphToRVSDG,
    "cost_model": MyCostModel(),
    "backend": GpuUfuncBackend(compile_only=not cuda.is_available()),
}

Configure the CUDA Ufunc Pipeline¶

Set up the pipeline to compile the GELU function as a CUDA-accelerated vectorized ufunc, using the GPU backend and the Pade44 tanh approximation.

In [5]:
report = Report("Pipeline execution report", enable_nested_metadata=True)
cuda_vectorized_gelu = ufunc_vectorize(
    input_type=Float32,
    ndim=1,
    compiler_config={
        **gpu_compiler_config,
        "pipeline_report": report,
        "pipeline_debug": True,
    },
    extra_ruleset=additional_rules | optimize_rules,
)(gelu_tanh_forward)
if __name__ == "__main__":
    report.display()

Pipeline execution report

1. Frontend (12.08ms) ▶
Frontend
Debug Info on RVSDG ▶
--------------------------------original source---------------------------------
 106|def gelu_tanh_forward(a):
 107|    dt = np.float32
 108|    result = (
 109|        dt(0.5)
 110|        * a
 111|        * (
 112|            dt(1)
 113|            + np.tanh(np.sqrt(dt(2) / dt(np.pi)) * (a + dt(0.044715) * a**3))
 114|        )
 115|    )
 116|    return result
----------------------------------inter source----------------------------------
   1|def transformed_gelu_tanh_forward(a):
   2|    """#file: demo01_gelu_tanh_approx.py"""
   3|    '#loc: 107:8-107:23'
   4|    dt = np.float32
   5|    '#loc: 108:8-115:9'
   6|    result = dt(0.5) * a * (dt(1) + np.tanh(np.sqrt(dt(2) / dt(np.pi)) * (a + dt(0.044715) * a ** 3)))
   7|    '#loc: 116:8-116:21'
   8|    return result
RVSDG ▶
transformed_gelu_tanh_forward = Func (Args (ArgSpec 'a' (PyNone)))
$0 = Region[804] <- !io a
{
  $1 = PyLoadGlobal $0[0] 'np'
  $2 = PyAttr $0[0] $1 'float32'
  $3 = DbgValue 'dt' $2[1]
  $4 = PyFloat 0.5
  $5 = PyCall $3 $2[0] $4
  $6 = PyBinOp * $5[0] $5[1], $0[1]
  $7 = PyInt 1
  $8 = PyCall $3 $6[0] $7
  $9 = PyInt 2
  $10 = PyCall $3 $8[0] $9
  $11 = PyLoadGlobal $10[0] 'np'
  $12 = PyAttr $10[0] $11 'pi'
  $13 = PyCall $3 $12[0] $12[1]
  $14 = PyBinOp / $13[0] $10[1], $13[1]
  $15 = PyLoadGlobal $14[0] 'np'
  $16 = PyAttr $14[0] $15 'sqrt'
  $17 = PyCall $16[1] $16[0] $14[1]
  $18 = PyFloat 0.044715
  $19 = PyCall $3 $17[0] $18
  $20 = PyInt 3
  $21 = PyBinOp ** $19[0] $0[1], $20
  $22 = PyBinOp * $21[0] $19[1], $21[1]
  $23 = PyBinOp + $22[0] $0[1], $22[1]
  $24 = PyBinOp * $23[0] $17[1], $23[1]
  $25 = PyLoadGlobal $24[0] 'np'
  $26 = PyAttr $24[0] $25 'tanh'
  $27 = PyCall $26[1] $26[0] $24[1]
  $28 = PyBinOp + $27[0] $8[1], $27[1]
  $29 = PyBinOp * $28[0] $6[1], $28[1]
  $30 = DbgValue 'result' $29[1]
} [1268] -> !io=$29[0] !ret=$30
[metadata] ▶
time elapsed 12.08ms
timing breakdown:
  8.32ms: Debug Info on RVSDG 
  3.77ms: RVSDG               
2. EGraph Conversion (109.13ms) ▶
EGraph Conversion
EGraph ▶
outer_cluster_InPorts-0 cluster_InPorts-0 outer_cluster_Port-80 cluster_Port-80 outer_cluster_Port-77 cluster_Port-77 outer_cluster_PortList-81 cluster_PortList-81 outer_cluster_Region-1 cluster_Region-1 outer_cluster_Term-36 cluster_Term-36 outer_cluster_Term-7 cluster_Term-7 outer_cluster_Term-52 cluster_Term-52 outer_cluster_Term-8 cluster_Term-8 outer_cluster_Term-41 cluster_Term-41 outer_cluster_Term-34 cluster_Term-34 outer_cluster_Term-74 cluster_Term-74 outer_cluster_Term-53 cluster_Term-53 outer_cluster_Term-51 cluster_Term-51 outer_cluster_Term-64 cluster_Term-64 outer_cluster_Term-10 cluster_Term-10 outer_cluster_Term-20 cluster_Term-20 outer_cluster_Term-73 cluster_Term-73 outer_cluster_Term-12 cluster_Term-12 outer_cluster_Term-57 cluster_Term-57 outer_cluster_Term-59 cluster_Term-59 outer_cluster_Term-30 cluster_Term-30 outer_cluster_Term-3 cluster_Term-3 outer_cluster_Term-50 cluster_Term-50 outer_cluster_Term-11 cluster_Term-11 outer_cluster_Term-82 cluster_Term-82 outer_cluster_Term-63 cluster_Term-63 outer_cluster_Term-42 cluster_Term-42 outer_cluster_Term-48 cluster_Term-48 outer_cluster_Term-71 cluster_Term-71 outer_cluster_Term-39 cluster_Term-39 outer_cluster_Term-6 cluster_Term-6 outer_cluster_Term-43 cluster_Term-43 outer_cluster_Term-62 cluster_Term-62 outer_cluster_Term-79 cluster_Term-79 outer_cluster_Term-65 cluster_Term-65 outer_cluster_Term-22 cluster_Term-22 outer_cluster_Term-19 cluster_Term-19 outer_cluster_Term-27 cluster_Term-27 outer_cluster_Term-33 cluster_Term-33 outer_cluster_Term-35 cluster_Term-35 outer_cluster_Term-84 cluster_Term-84 outer_cluster_Term-18 cluster_Term-18 outer_cluster_Term-55 cluster_Term-55 outer_cluster_Term-54 cluster_Term-54 outer_cluster_Term-14 cluster_Term-14 outer_cluster_Term-32 cluster_Term-32 outer_cluster_Term-83 cluster_Term-83 outer_cluster_Term-49 cluster_Term-49 outer_cluster_Term-4 cluster_Term-4 outer_cluster_Term-69 cluster_Term-69 outer_cluster_Term-31 cluster_Term-31 outer_cluster_Term-75 cluster_Term-75 outer_cluster_Term-61 cluster_Term-61 outer_cluster_Term-37 cluster_Term-37 outer_cluster_Term-25 cluster_Term-25 outer_cluster_Term-15 cluster_Term-15 outer_cluster_Term-5 cluster_Term-5 outer_cluster_Term-46 cluster_Term-46 outer_cluster_Term-72 cluster_Term-72 outer_cluster_Term-26 cluster_Term-26 outer_cluster_Term-76 cluster_Term-76 outer_cluster_Term-24 cluster_Term-24 outer_cluster_Term-78 cluster_Term-78 outer_cluster_Term-29 cluster_Term-29 outer_cluster_Term-58 cluster_Term-58 outer_cluster_Term-23 cluster_Term-23 outer_cluster_Term-70 cluster_Term-70 outer_cluster_Term-13 cluster_Term-13 outer_cluster_Term-67 cluster_Term-67 outer_cluster_Term-47 cluster_Term-47 outer_cluster_Term-56 cluster_Term-56 outer_cluster_Term-2 cluster_Term-2 outer_cluster_Term-68 cluster_Term-68 outer_cluster_Term-16 cluster_Term-16 outer_cluster_Term-45 cluster_Term-45 outer_cluster_Term-60 cluster_Term-60 outer_cluster_Term-38 cluster_Term-38 outer_cluster_TermList-28 cluster_TermList-28 outer_cluster_TermList-44 cluster_TermList-44 outer_cluster_TermList-17 cluster_TermList-17 outer_cluster_TermList-66 cluster_TermList-66 outer_cluster_TermList-21 cluster_TermList-21 outer_cluster_TermList-9 cluster_TermList-9 outer_cluster_TermList-40 cluster_TermList-40 outer_cluster_Vec_Port-0 cluster_Vec_Port-0 outer_cluster_Vec_String-0 cluster_Vec_String-0 outer_cluster_Vec_Term-5 cluster_Vec_Term-5 outer_cluster_Vec_Term-6 cluster_Vec_Term-6 outer_cluster_Vec_Term-3 cluster_Vec_Term-3 outer_cluster_Vec_Term-4 cluster_Vec_Term-4 outer_cluster_Vec_Term-1 cluster_Vec_Term-1 outer_cluster_Vec_Term-2 cluster_Vec_Term-2 outer_cluster_Vec_Term-0 cluster_Vec_Term-0 function-0-InPorts___init__:s->primitive-Vec_String-0 function-1-Port___init__:s->function-1-Term_DbgValue function-1-Term_DbgValue:s->function-37-Term_getPort function-0-Port___init__:s->function-36-Term_getPort function-36-Term_getPort:s->function-3-Py_MulIO function-0-PortList___init__:s->primitive-Vec_Port-0 primitive-Vec_Port-0:s->function-1-Port___init__ primitive-Vec_Port-0:s->function-0-Port___init__ function-0-Region___init__:s->function-0-InPorts___init__ function-2-Py_AttrIO:s->function-12-Term_getPort function-2-Py_AttrIO:s->function-2-Py_LoadGlobal function-12-Term_getPort:s->function-0-Py_DivIO function-2-Py_LoadGlobal:s->function-12-Term_getPort function-1-Term_getPort:s->function-0-Py_AttrIO function-0-Py_AttrIO:s->function-0-Py_LoadGlobal function-0-Py_AttrIO:s->function-0-Region_get function-1-Py_MulIO:s->function-18-Term_getPort function-1-Py_MulIO:s->function-19-Term_getPort function-1-Py_MulIO:s->function-20-Term_getPort function-18-Term_getPort:s->function-0-Py_PowIO function-19-Term_getPort:s->function-5-Py_Call function-20-Term_getPort:s->function-0-Py_PowIO function-4-Py_Call:s->function-13-Term_getPort function-4-Py_Call:s->function-14-Term_getPort function-4-Py_Call:s->function-4-TermList___init__ function-13-Term_getPort:s->function-2-Py_AttrIO function-14-Term_getPort:s->function-2-Py_AttrIO function-4-TermList___init__:s->primitive-Vec_Term-4 function-0-Py_DivIO:s->function-9-Term_getPort function-0-Py_DivIO:s->function-10-Term_getPort function-0-Py_DivIO:s->function-11-Term_getPort function-35-Term_getPort:s->function-1-Py_AddIO function-1-Py_AddIO:s->function-30-Term_getPort function-1-Py_AddIO:s->function-31-Term_getPort function-1-Py_AddIO:s->function-32-Term_getPort function-21-Term_getPort:s->function-1-Py_MulIO function-0-Py_PowIO:s->function-17-Term_getPort function-0-Py_PowIO:s->function-1-Region_get function-0-Py_PowIO:s->function-2-Term_LiteralI64 function-28-Term_getPort:s->function-3-Py_AttrIO function-3-Py_AttrIO:s->function-26-Term_getPort function-3-Py_AttrIO:s->function-3-Py_LoadGlobal function-0-Py_Call:s->function-1-Term_getPort function-0-Py_Call:s->function-0-Term_DbgValue function-0-Py_Call:s->function-0-TermList___init__ function-0-Term_DbgValue:s->function-0-Term_getPort function-0-TermList___init__:s->primitive-Vec_Term-0 function-34-Term_getPort:s->function-0-Py_MulIO function-0-Py_MulIO:s->function-3-Term_getPort function-0-Py_MulIO:s->function-2-Term_getPort function-0-Py_MulIO:s->function-1-Region_get function-3-Term_getPort:s->function-0-Py_Call function-24-Term_getPort:s->function-4-Py_Call function-2-Py_MulIO:s->function-24-Term_getPort function-2-Py_MulIO:s->function-23-Term_getPort function-2-Py_MulIO:s->function-25-Term_getPort function-23-Term_getPort:s->function-0-Py_AddIO function-25-Term_getPort:s->function-0-Py_AddIO function-9-Term_getPort:s->function-3-Py_Call function-3-Py_Call:s->function-0-Term_DbgValue function-3-Py_Call:s->function-7-Term_getPort function-3-Py_Call:s->function-3-TermList___init__ function-0-Py_LoadGlobal:s->function-0-Region_get function-0-Region_get:s->function-0-Region___init__ function-5-Py_Call:s->function-0-Term_DbgValue function-5-Py_Call:s->function-16-Term_getPort function-5-Py_Call:s->function-5-TermList___init__ function-2-Term_getPort:s->function-0-Py_Call function-0-Term_RegionEnd:s->function-0-PortList___init__ function-0-Term_RegionEnd:s->function-0-Region___init__ function-27-Term_getPort:s->function-3-Py_AttrIO function-16-Term_getPort:s->function-4-Py_Call function-17-Term_getPort:s->function-5-Py_Call function-1-Region_get:s->function-0-Region___init__ function-30-Term_getPort:s->function-6-Py_Call function-31-Term_getPort:s->function-1-Py_Call function-32-Term_getPort:s->function-6-Py_Call function-15-Term_getPort:s->function-0-Py_DivIO function-0-Term_getPort:s->function-0-Py_AttrIO function-26-Term_getPort:s->function-2-Py_MulIO function-3-Py_LoadGlobal:s->function-26-Term_getPort function-37-Term_getPort:s->function-3-Py_MulIO function-29-Term_getPort:s->function-2-Py_MulIO function-2-Py_Call:s->function-0-Term_DbgValue function-2-Py_Call:s->function-5-Term_getPort function-2-Py_Call:s->function-2-TermList___init__ function-5-Term_getPort:s->function-1-Py_Call function-2-TermList___init__:s->primitive-Vec_Term-2 function-1-Py_Call:s->function-0-Term_DbgValue function-1-Py_Call:s->function-4-Term_getPort function-1-Py_Call:s->function-1-TermList___init__ function-8-Term_getPort:s->function-1-Py_AttrIO function-1-Py_AttrIO:s->function-6-Term_getPort function-1-Py_AttrIO:s->function-1-Py_LoadGlobal function-10-Term_getPort:s->function-2-Py_Call function-11-Term_getPort:s->function-3-Py_Call function-0-GraphRoot:s->function-0-Term_Func function-0-Term_Func:s->function-0-Term_RegionEnd function-4-Term_getPort:s->function-0-Py_MulIO function-1-TermList___init__:s->primitive-Vec_Term-1 function-0-Py_AddIO:s->function-21-Term_getPort function-0-Py_AddIO:s->function-1-Region_get function-0-Py_AddIO:s->function-22-Term_getPort function-22-Term_getPort:s->function-1-Py_MulIO function-3-Py_MulIO:s->function-35-Term_getPort function-3-Py_MulIO:s->function-34-Term_getPort function-3-Py_MulIO:s->function-33-Term_getPort function-33-Term_getPort:s->function-1-Py_AddIO function-6-Term_getPort:s->function-2-Py_Call function-1-Py_LoadGlobal:s->function-6-Term_getPort function-7-Term_getPort:s->function-1-Py_AttrIO function-3-TermList___init__:s->primitive-Vec_Term-3 function-6-Py_Call:s->function-28-Term_getPort function-6-Py_Call:s->function-27-Term_getPort function-6-Py_Call:s->function-6-TermList___init__ function-6-TermList___init__:s->primitive-Vec_Term-6 function-5-TermList___init__:s->primitive-Vec_Term-5 primitive-Vec_Term-3:s->function-8-Term_getPort primitive-Vec_Term-5:s->function-1-Term_LiteralF64 primitive-Vec_Term-1:s->function-0-Term_LiteralI64 primitive-Vec_Term-6:s->function-29-Term_getPort primitive-Vec_Term-2:s->function-1-Term_LiteralI64 primitive-Vec_Term-0:s->function-0-Term_LiteralF64 primitive-Vec_Term-4:s->function-15-Term_getPort function-0-InPorts___init__ InPorts primitive-Vec_String-0 Vec("!io", "a") function-1-Port___init__ Port("!ret", ·) function-1-Term_DbgValue Term.DbgValue("result", ·) function-0-Port___init__ Port("!io", ·) function-36-Term_getPort ·.getPort(·, 0) function-0-PortList___init__ PortList primitive-Vec_Port-0 Vec function-0-Region___init__ Region("804", ·) function-2-Py_AttrIO Py_AttrIO(·, ·, "sqrt") function-12-Term_getPort ·.getPort(·, 0) function-2-Py_LoadGlobal Py_LoadGlobal(·, "np") function-1-Term_getPort ·.getPort(·, 0) function-0-Py_AttrIO Py_AttrIO(·, ·, "float32") function-1-Py_MulIO Py_MulIO function-18-Term_getPort ·.getPort(·, 0) function-19-Term_getPort ·.getPort(·, 1) function-20-Term_getPort ·.getPort(·, 1) function-0-Term_LiteralF64 Term.LiteralF64(0.5) function-4-Py_Call Py_Call function-13-Term_getPort ·.getPort(·, 1) function-14-Term_getPort ·.getPort(·, 0) function-4-TermList___init__ TermList function-0-Py_DivIO Py_DivIO function-35-Term_getPort ·.getPort(·, 1) function-1-Py_AddIO Py_AddIO function-21-Term_getPort ·.getPort(·, 0) function-0-Py_PowIO Py_PowIO function-28-Term_getPort ·.getPort(·, 0) function-3-Py_AttrIO Py_AttrIO(·, ·, "tanh") function-0-Py_Call Py_Call function-0-Term_DbgValue Term.DbgValue("dt", ·) function-0-TermList___init__ TermList function-1-Term_LiteralI64 Term.LiteralI64(2) function-34-Term_getPort ·.getPort(·, 1) function-0-Py_MulIO Py_MulIO function-3-Term_getPort ·.getPort(·, 1) function-24-Term_getPort ·.getPort(·, 1) function-2-Py_MulIO Py_MulIO function-23-Term_getPort ·.getPort(·, 0) function-25-Term_getPort ·.getPort(·, 1) function-9-Term_getPort ·.getPort(·, 0) function-3-Py_Call Py_Call function-0-Py_LoadGlobal Py_LoadGlobal(·, "np") function-0-Region_get ·.get(·, 0) function-5-Py_Call Py_Call function-2-Term_getPort ·.getPort(·, 0) function-0-Term_RegionEnd Term.RegionEnd function-27-Term_getPort ·.getPort(·, 1) function-16-Term_getPort ·.getPort(·, 0) function-17-Term_getPort ·.getPort(·, 0) function-1-Region_get ·.get(·, 1) function-2-Term_LiteralI64 Term.LiteralI64(3) function-30-Term_getPort ·.getPort(·, 0) function-31-Term_getPort ·.getPort(·, 1) function-32-Term_getPort ·.getPort(·, 1) function-15-Term_getPort ·.getPort(·, 1) function-0-Term_getPort ·.getPort(·, 1) function-1-Term_LiteralF64 Term.LiteralF64(0.044715) function-26-Term_getPort ·.getPort(·, 0) function-3-Py_LoadGlobal Py_LoadGlobal(·, "np") function-37-Term_getPort ·.getPort(·, 1) function-29-Term_getPort ·.getPort(·, 1) function-2-Py_Call Py_Call function-5-Term_getPort ·.getPort(·, 0) function-2-TermList___init__ TermList function-1-Py_Call Py_Call function-8-Term_getPort ·.getPort(·, 1) function-1-Py_AttrIO Py_AttrIO(·, ·, "pi") function-10-Term_getPort ·.getPort(·, 1) function-11-Term_getPort ·.getPort(·, 1) function-0-GraphRoot GraphRoot function-0-Term_Func Term.Func("1274", "transformed_gelu_tanh_forward", ·) function-4-Term_getPort ·.getPort(·, 0) function-1-TermList___init__ TermList function-0-Py_AddIO Py_AddIO function-22-Term_getPort ·.getPort(·, 1) function-3-Py_MulIO Py_MulIO function-33-Term_getPort ·.getPort(·, 0) function-6-Term_getPort ·.getPort(·, 0) function-1-Py_LoadGlobal Py_LoadGlobal(·, "np") function-7-Term_getPort ·.getPort(·, 0) function-3-TermList___init__ TermList function-6-Py_Call Py_Call function-6-TermList___init__ TermList function-0-Term_LiteralI64 Term.LiteralI64(1) function-5-TermList___init__ TermList primitive-Vec_Term-3 Vec primitive-Vec_Term-5 Vec primitive-Vec_Term-1 Vec primitive-Vec_Term-6 Vec primitive-Vec_Term-2 Vec primitive-Vec_Term-0 Vec primitive-Vec_Term-4 Vec
[metadata] ▶
time elapsed 109.13ms
timing breakdown:
  109.13ms: EGraph              
3. Egraph Saturation (269.30ms) ▶
Egraph Saturation
[debug] initial egraph ▶
outer_cluster_ErrorMsg-85 cluster_ErrorMsg-85 outer_cluster_InPorts-0 cluster_InPorts-0 outer_cluster_Port-77 cluster_Port-77 outer_cluster_Port-80 cluster_Port-80 outer_cluster_PortList-81 cluster_PortList-81 outer_cluster_Region-1 cluster_Region-1 outer_cluster_Term-75 cluster_Term-75 outer_cluster_Term-3 cluster_Term-3 outer_cluster_Term-11 cluster_Term-11 outer_cluster_Term-27 cluster_Term-27 outer_cluster_Term-68 cluster_Term-68 outer_cluster_Term-16 cluster_Term-16 outer_cluster_Term-19 cluster_Term-19 outer_cluster_Term-63 cluster_Term-63 outer_cluster_Term-49 cluster_Term-49 outer_cluster_Term-7 cluster_Term-7 outer_cluster_Term-65 cluster_Term-65 outer_cluster_Term-62 cluster_Term-62 outer_cluster_Term-57 cluster_Term-57 outer_cluster_Term-47 cluster_Term-47 outer_cluster_Term-51 cluster_Term-51 outer_cluster_Term-58 cluster_Term-58 outer_cluster_Term-56 cluster_Term-56 outer_cluster_Term-10 cluster_Term-10 outer_cluster_Term-76 cluster_Term-76 outer_cluster_Term-22 cluster_Term-22 outer_cluster_Term-70 cluster_Term-70 outer_cluster_Term-73 cluster_Term-73 outer_cluster_Term-37 cluster_Term-37 outer_cluster_Term-8 cluster_Term-8 outer_cluster_Term-31 cluster_Term-31 outer_cluster_Term-24 cluster_Term-24 outer_cluster_Term-42 cluster_Term-42 outer_cluster_Term-36 cluster_Term-36 outer_cluster_Term-61 cluster_Term-61 outer_cluster_Term-4 cluster_Term-4 outer_cluster_Term-59 cluster_Term-59 outer_cluster_Term-64 cluster_Term-64 outer_cluster_Term-33 cluster_Term-33 outer_cluster_Term-53 cluster_Term-53 outer_cluster_Term-30 cluster_Term-30 outer_cluster_Term-38 cluster_Term-38 outer_cluster_Term-84 cluster_Term-84 outer_cluster_Term-41 cluster_Term-41 outer_cluster_Term-60 cluster_Term-60 outer_cluster_Term-43 cluster_Term-43 outer_cluster_Term-79 cluster_Term-79 outer_cluster_Term-6 cluster_Term-6 outer_cluster_Term-20 cluster_Term-20 outer_cluster_Term-52 cluster_Term-52 outer_cluster_Term-32 cluster_Term-32 outer_cluster_Term-34 cluster_Term-34 outer_cluster_Term-69 cluster_Term-69 outer_cluster_Term-78 cluster_Term-78 outer_cluster_Term-72 cluster_Term-72 outer_cluster_Term-15 cluster_Term-15 outer_cluster_Term-71 cluster_Term-71 outer_cluster_Term-13 cluster_Term-13 outer_cluster_Term-48 cluster_Term-48 outer_cluster_Term-46 cluster_Term-46 outer_cluster_Term-2 cluster_Term-2 outer_cluster_Term-5 cluster_Term-5 outer_cluster_Term-54 cluster_Term-54 outer_cluster_Term-25 cluster_Term-25 outer_cluster_Term-74 cluster_Term-74 outer_cluster_Term-83 cluster_Term-83 outer_cluster_Term-82 cluster_Term-82 outer_cluster_Term-29 cluster_Term-29 outer_cluster_Term-55 cluster_Term-55 outer_cluster_Term-23 cluster_Term-23 outer_cluster_Term-12 cluster_Term-12 outer_cluster_Term-26 cluster_Term-26 outer_cluster_Term-39 cluster_Term-39 outer_cluster_Term-35 cluster_Term-35 outer_cluster_Term-50 cluster_Term-50 outer_cluster_Term-18 cluster_Term-18 outer_cluster_Term-67 cluster_Term-67 outer_cluster_Term-14 cluster_Term-14 outer_cluster_Term-45 cluster_Term-45 outer_cluster_TermList-28 cluster_TermList-28 outer_cluster_TermList-66 cluster_TermList-66 outer_cluster_TermList-44 cluster_TermList-44 outer_cluster_TermList-9 cluster_TermList-9 outer_cluster_TermList-17 cluster_TermList-17 outer_cluster_TermList-40 cluster_TermList-40 outer_cluster_TermList-21 cluster_TermList-21 outer_cluster_Vec_Port-0 cluster_Vec_Port-0 outer_cluster_Vec_String-0 cluster_Vec_String-0 outer_cluster_Vec_Term-0 cluster_Vec_Term-0 outer_cluster_Vec_Term-6 cluster_Vec_Term-6 outer_cluster_Vec_Term-2 cluster_Vec_Term-2 outer_cluster_Vec_Term-4 cluster_Vec_Term-4 outer_cluster_Vec_Term-3 cluster_Vec_Term-3 outer_cluster_Vec_Term-1 cluster_Vec_Term-1 outer_cluster_Vec_Term-5 cluster_Vec_Term-5 function-0-InPorts___init__:s->primitive-Vec_String-0 function-0-Port___init__:s->function-36-Term_getPort function-36-Term_getPort:s->function-3-Py_MulIO function-1-Port___init__:s->function-1-Term_DbgValue function-1-Term_DbgValue:s->function-37-Term_getPort function-0-PortList___init__:s->primitive-Vec_Port-0 primitive-Vec_Port-0:s->function-0-Port___init__ primitive-Vec_Port-0:s->function-1-Port___init__ function-0-Region___init__:s->function-0-InPorts___init__ function-3-Py_MulIO:s->function-33-Term_getPort function-3-Py_MulIO:s->function-34-Term_getPort function-3-Py_MulIO:s->function-35-Term_getPort function-33-Term_getPort:s->function-1-Py_AddIO function-34-Term_getPort:s->function-0-Py_MulIO function-35-Term_getPort:s->function-1-Py_AddIO function-0-Py_LoadGlobal:s->function-0-Region_get function-0-Region_get:s->function-0-Region___init__ function-2-Term_getPort:s->function-0-Py_Call function-0-Py_Call:s->function-1-Term_getPort function-0-Py_Call:s->function-0-Term_DbgValue function-0-Py_Call:s->function-0-TermList___init__ function-8-Term_getPort:s->function-1-Py_AttrIO function-1-Py_AttrIO:s->function-1-Py_LoadGlobal function-1-Py_AttrIO:s->function-6-Term_getPort function-30-Term_getPort:s->function-6-Py_Call function-6-Py_Call:s->function-27-Term_getPort function-6-Py_Call:s->function-28-Term_getPort function-6-Py_Call:s->function-6-TermList___init__ function-5-Term_getPort:s->function-1-Py_Call function-1-Py_Call:s->function-0-Term_DbgValue function-1-Py_Call:s->function-4-Term_getPort function-1-Py_Call:s->function-1-TermList___init__ function-27-Term_getPort:s->function-3-Py_AttrIO function-3-Py_AttrIO:s->function-26-Term_getPort function-3-Py_AttrIO:s->function-3-Py_LoadGlobal function-18-Term_getPort:s->function-0-Py_PowIO function-0-Py_PowIO:s->function-2-Term_LiteralI64 function-0-Py_PowIO:s->function-1-Region_get function-0-Py_PowIO:s->function-17-Term_getPort function-1-Term_getPort:s->function-0-Py_AttrIO function-0-Py_AttrIO:s->function-0-Py_LoadGlobal function-0-Py_AttrIO:s->function-0-Region_get function-29-Term_getPort:s->function-2-Py_MulIO function-2-Py_MulIO:s->function-24-Term_getPort function-2-Py_MulIO:s->function-25-Term_getPort function-2-Py_MulIO:s->function-23-Term_getPort function-26-Term_getPort:s->function-2-Py_MulIO function-3-Py_LoadGlobal:s->function-26-Term_getPort function-24-Term_getPort:s->function-4-Py_Call function-4-Py_Call:s->function-13-Term_getPort function-4-Py_Call:s->function-14-Term_getPort function-4-Py_Call:s->function-4-TermList___init__ function-20-Term_getPort:s->function-0-Py_PowIO function-25-Term_getPort:s->function-0-Py_AddIO function-0-Py_AddIO:s->function-21-Term_getPort function-0-Py_AddIO:s->function-1-Region_get function-0-Py_AddIO:s->function-22-Term_getPort function-23-Term_getPort:s->function-0-Py_AddIO function-0-Term_DbgValue:s->function-0-Term_getPort function-0-TermList___init__:s->primitive-Vec_Term-0 function-2-Py_Call:s->function-5-Term_getPort function-2-Py_Call:s->function-0-Term_DbgValue function-2-Py_Call:s->function-2-TermList___init__ function-2-TermList___init__:s->primitive-Vec_Term-2 function-32-Term_getPort:s->function-6-Py_Call function-0-Py_MulIO:s->function-2-Term_getPort function-0-Py_MulIO:s->function-1-Region_get function-0-Py_MulIO:s->function-3-Term_getPort function-13-Term_getPort:s->function-2-Py_AttrIO function-2-Py_AttrIO:s->function-12-Term_getPort function-2-Py_AttrIO:s->function-2-Py_LoadGlobal function-10-Term_getPort:s->function-2-Py_Call function-1-Py_LoadGlobal:s->function-6-Term_getPort function-6-Term_getPort:s->function-2-Py_Call function-16-Term_getPort:s->function-4-Py_Call function-12-Term_getPort:s->function-0-Py_DivIO function-2-Py_LoadGlobal:s->function-12-Term_getPort function-28-Term_getPort:s->function-3-Py_AttrIO function-0-Py_DivIO:s->function-10-Term_getPort function-0-Py_DivIO:s->function-9-Term_getPort function-0-Py_DivIO:s->function-11-Term_getPort function-9-Term_getPort:s->function-3-Py_Call function-11-Term_getPort:s->function-3-Py_Call function-21-Term_getPort:s->function-1-Py_MulIO function-1-Py_MulIO:s->function-18-Term_getPort function-1-Py_MulIO:s->function-20-Term_getPort function-1-Py_MulIO:s->function-19-Term_getPort function-3-Py_Call:s->function-0-Term_DbgValue function-3-Py_Call:s->function-7-Term_getPort function-3-Py_Call:s->function-3-TermList___init__ function-14-Term_getPort:s->function-2-Py_AttrIO function-0-GraphRoot:s->function-0-Term_Func function-0-Term_Func:s->function-0-Term_RegionEnd function-4-TermList___init__:s->primitive-Vec_Term-4 function-37-Term_getPort:s->function-3-Py_MulIO function-0-Term_getPort:s->function-0-Py_AttrIO function-19-Term_getPort:s->function-5-Py_Call function-31-Term_getPort:s->function-1-Py_Call function-1-Py_AddIO:s->function-30-Term_getPort function-1-Py_AddIO:s->function-32-Term_getPort function-1-Py_AddIO:s->function-31-Term_getPort function-4-Term_getPort:s->function-0-Py_MulIO function-1-Region_get:s->function-0-Region___init__ function-17-Term_getPort:s->function-5-Py_Call function-5-Py_Call:s->function-0-Term_DbgValue function-5-Py_Call:s->function-16-Term_getPort function-5-Py_Call:s->function-5-TermList___init__ function-22-Term_getPort:s->function-1-Py_MulIO function-0-Term_RegionEnd:s->function-0-PortList___init__ function-0-Term_RegionEnd:s->function-0-Region___init__ function-7-Term_getPort:s->function-1-Py_AttrIO function-3-TermList___init__:s->primitive-Vec_Term-3 function-3-Term_getPort:s->function-0-Py_Call function-15-Term_getPort:s->function-0-Py_DivIO function-1-TermList___init__:s->primitive-Vec_Term-1 function-6-TermList___init__:s->primitive-Vec_Term-6 function-5-TermList___init__:s->primitive-Vec_Term-5 primitive-Vec_Term-3:s->function-8-Term_getPort primitive-Vec_Term-6:s->function-29-Term_getPort primitive-Vec_Term-5:s->function-1-Term_LiteralF64 primitive-Vec_Term-0:s->function-0-Term_LiteralF64 primitive-Vec_Term-1:s->function-0-Term_LiteralI64 primitive-Vec_Term-4:s->function-15-Term_getPort primitive-Vec_Term-2:s->function-1-Term_LiteralI64 function-0-ErrorMsg_root ErrorMsg.root function-0-InPorts___init__ InPorts primitive-Vec_String-0 Vec("!io", "a") function-0-Port___init__ Port("!io", ·) function-36-Term_getPort ·.getPort(·, 0) function-1-Port___init__ Port("!ret", ·) function-1-Term_DbgValue Term.DbgValue("result", ·) function-0-PortList___init__ PortList primitive-Vec_Port-0 Vec function-0-Region___init__ Region("804", ·) function-3-Py_MulIO Py_MulIO function-33-Term_getPort ·.getPort(·, 0) function-34-Term_getPort ·.getPort(·, 1) function-35-Term_getPort ·.getPort(·, 1) function-0-Py_LoadGlobal Py_LoadGlobal(·, "np") function-0-Region_get ·.get(·, 0) function-2-Term_getPort ·.getPort(·, 0) function-0-Py_Call Py_Call function-8-Term_getPort ·.getPort(·, 1) function-1-Py_AttrIO Py_AttrIO(·, ·, "pi") function-30-Term_getPort ·.getPort(·, 0) function-6-Py_Call Py_Call function-0-Term_LiteralI64 Term.LiteralI64(1) function-5-Term_getPort ·.getPort(·, 0) function-1-Py_Call Py_Call function-27-Term_getPort ·.getPort(·, 1) function-3-Py_AttrIO Py_AttrIO(·, ·, "tanh") function-18-Term_getPort ·.getPort(·, 0) function-0-Py_PowIO Py_PowIO function-1-Term_getPort ·.getPort(·, 0) function-0-Py_AttrIO Py_AttrIO(·, ·, "float32") function-29-Term_getPort ·.getPort(·, 1) function-2-Py_MulIO Py_MulIO function-26-Term_getPort ·.getPort(·, 0) function-3-Py_LoadGlobal Py_LoadGlobal(·, "np") function-24-Term_getPort ·.getPort(·, 1) function-4-Py_Call Py_Call function-2-Term_LiteralI64 Term.LiteralI64(3) function-20-Term_getPort ·.getPort(·, 1) function-25-Term_getPort ·.getPort(·, 1) function-0-Py_AddIO Py_AddIO function-23-Term_getPort ·.getPort(·, 0) function-0-Term_DbgValue Term.DbgValue("dt", ·) function-0-TermList___init__ TermList function-2-Py_Call Py_Call function-2-TermList___init__ TermList function-32-Term_getPort ·.getPort(·, 1) function-0-Py_MulIO Py_MulIO function-13-Term_getPort ·.getPort(·, 1) function-2-Py_AttrIO Py_AttrIO(·, ·, "sqrt") function-0-Term_LiteralF64 Term.LiteralF64(0.5) function-10-Term_getPort ·.getPort(·, 1) function-1-Py_LoadGlobal Py_LoadGlobal(·, "np") function-6-Term_getPort ·.getPort(·, 0) function-16-Term_getPort ·.getPort(·, 0) function-12-Term_getPort ·.getPort(·, 0) function-2-Py_LoadGlobal Py_LoadGlobal(·, "np") function-28-Term_getPort ·.getPort(·, 0) function-0-Py_DivIO Py_DivIO function-9-Term_getPort ·.getPort(·, 0) function-11-Term_getPort ·.getPort(·, 1) function-21-Term_getPort ·.getPort(·, 0) function-1-Py_MulIO Py_MulIO function-3-Py_Call Py_Call function-14-Term_getPort ·.getPort(·, 0) function-0-GraphRoot GraphRoot function-0-Term_Func Term.Func("1274", "transformed_gelu_tanh_forward", ·) function-4-TermList___init__ TermList function-1-Term_LiteralF64 Term.LiteralF64(0.044715) function-37-Term_getPort ·.getPort(·, 1) function-0-Term_getPort ·.getPort(·, 1) function-1-Term_LiteralI64 Term.LiteralI64(2) function-19-Term_getPort ·.getPort(·, 1) function-31-Term_getPort ·.getPort(·, 1) function-1-Py_AddIO Py_AddIO function-4-Term_getPort ·.getPort(·, 0) function-1-Region_get ·.get(·, 1) function-17-Term_getPort ·.getPort(·, 0) function-5-Py_Call Py_Call function-22-Term_getPort ·.getPort(·, 1) function-0-Term_RegionEnd Term.RegionEnd function-7-Term_getPort ·.getPort(·, 0) function-3-TermList___init__ TermList function-3-Term_getPort ·.getPort(·, 1) function-15-Term_getPort ·.getPort(·, 1) function-1-TermList___init__ TermList function-6-TermList___init__ TermList function-5-TermList___init__ TermList primitive-Vec_Term-3 Vec primitive-Vec_Term-6 Vec primitive-Vec_Term-5 Vec primitive-Vec_Term-0 Vec primitive-Vec_Term-1 Vec primitive-Vec_Term-4 Vec primitive-Vec_Term-2 Vec
[debug] saturated egraph ▶
outer_cluster_ErrorMsg-85 cluster_ErrorMsg-85 outer_cluster_InPorts-0 cluster_InPorts-0 outer_cluster_Module-100 cluster_Module-100 outer_cluster_Port-113 cluster_Port-113 outer_cluster_Port-130 cluster_Port-130 outer_cluster_PortList-81 cluster_PortList-81 outer_cluster_Region-1 cluster_Region-1 outer_cluster_String-10 cluster_String-10 outer_cluster_String-2684354614 cluster_String-2684354614 outer_cluster_Term-20 cluster_Term-20 outer_cluster_Term-184 cluster_Term-184 outer_cluster_Term-37 cluster_Term-37 outer_cluster_Term-59 cluster_Term-59 outer_cluster_Term-165 cluster_Term-165 outer_cluster_Term-33 cluster_Term-33 outer_cluster_Term-13 cluster_Term-13 outer_cluster_Term-29 cluster_Term-29 outer_cluster_Term-193 cluster_Term-193 outer_cluster_Term-191 cluster_Term-191 outer_cluster_Term-14 cluster_Term-14 outer_cluster_Term-6 cluster_Term-6 outer_cluster_Term-189 cluster_Term-189 outer_cluster_Term-182 cluster_Term-182 outer_cluster_Term-62 cluster_Term-62 outer_cluster_Term-52 cluster_Term-52 outer_cluster_Term-50 cluster_Term-50 outer_cluster_Term-149 cluster_Term-149 outer_cluster_Term-32 cluster_Term-32 outer_cluster_Term-190 cluster_Term-190 outer_cluster_Term-69 cluster_Term-69 outer_cluster_Term-180 cluster_Term-180 outer_cluster_Term-178 cluster_Term-178 outer_cluster_Term-12 cluster_Term-12 outer_cluster_Term-218 cluster_Term-218 outer_cluster_Term-8 cluster_Term-8 outer_cluster_Term-51 cluster_Term-51 outer_cluster_Term-70 cluster_Term-70 outer_cluster_Term-188 cluster_Term-188 outer_cluster_Term-41 cluster_Term-41 outer_cluster_Term-74 cluster_Term-74 outer_cluster_Term-16 cluster_Term-16 outer_cluster_Term-63 cluster_Term-63 outer_cluster_Term-192 cluster_Term-192 outer_cluster_Term-36 cluster_Term-36 outer_cluster_Term-142 cluster_Term-142 outer_cluster_Term-183 cluster_Term-183 outer_cluster_Term-22 cluster_Term-22 outer_cluster_Term-25 cluster_Term-25 outer_cluster_Term-57 cluster_Term-57 outer_cluster_Term-71 cluster_Term-71 outer_cluster_Term-84 cluster_Term-84 outer_cluster_Term-27 cluster_Term-27 outer_cluster_Term-31 cluster_Term-31 outer_cluster_Term-73 cluster_Term-73 outer_cluster_Term-61 cluster_Term-61 outer_cluster_Term-105 cluster_Term-105 outer_cluster_Term-75 cluster_Term-75 outer_cluster_Term-156 cluster_Term-156 outer_cluster_Term-83 cluster_Term-83 outer_cluster_Term-67 cluster_Term-67 outer_cluster_Term-140 cluster_Term-140 outer_cluster_Term-10 cluster_Term-10 outer_cluster_Term-181 cluster_Term-181 outer_cluster_Term-43 cluster_Term-43 outer_cluster_Term-65 cluster_Term-65 outer_cluster_Term-185 cluster_Term-185 outer_cluster_Term-82 cluster_Term-82 outer_cluster_Term-45 cluster_Term-45 outer_cluster_Term-210 cluster_Term-210 outer_cluster_Term-115 cluster_Term-115 outer_cluster_Term-58 cluster_Term-58 outer_cluster_Term-39 cluster_Term-39 outer_cluster_Term-187 cluster_Term-187 outer_cluster_Term-47 cluster_Term-47 outer_cluster_Term-151 cluster_Term-151 outer_cluster_Term-18 cluster_Term-18 outer_cluster_Term-186 cluster_Term-186 outer_cluster_Term-55 cluster_Term-55 outer_cluster_Term-54 cluster_Term-54 outer_cluster_Term-4 cluster_Term-4 outer_cluster_Term-179 cluster_Term-179 outer_cluster_Term-48 cluster_Term-48 outer_cluster_TermList-40 cluster_TermList-40 outer_cluster_TermList-21 cluster_TermList-21 outer_cluster_TermList-44 cluster_TermList-44 outer_cluster_TermList-66 cluster_TermList-66 outer_cluster_TermList-28 cluster_TermList-28 outer_cluster_TermList-9 cluster_TermList-9 outer_cluster_TermList-17 cluster_TermList-17 outer_cluster_Type-101 cluster_Type-101 outer_cluster_Type-98 cluster_Type-98 outer_cluster_Type-90 cluster_Type-90 outer_cluster_Type-94 cluster_Type-94 outer_cluster_TypeVar-155 cluster_TypeVar-155 outer_cluster_TypeVar-202 cluster_TypeVar-202 outer_cluster_TypeVar-207 cluster_TypeVar-207 outer_cluster_TypeVar-204 cluster_TypeVar-204 outer_cluster_TypeVar-132 cluster_TypeVar-132 outer_cluster_TypeVar-139 cluster_TypeVar-139 outer_cluster_TypeVar-91 cluster_TypeVar-91 outer_cluster_TypeVar-162 cluster_TypeVar-162 outer_cluster_TypeVar-177 cluster_TypeVar-177 outer_cluster_TypeVar-195 cluster_TypeVar-195 outer_cluster_TypeVar-200 cluster_TypeVar-200 outer_cluster_TypeVar-206 cluster_TypeVar-206 outer_cluster_TypeVar-167 cluster_TypeVar-167 outer_cluster_TypeVar-145 cluster_TypeVar-145 outer_cluster_TypeVar-208 cluster_TypeVar-208 outer_cluster_TypeVar-144 cluster_TypeVar-144 outer_cluster_TypeVar-217 cluster_TypeVar-217 outer_cluster_TypeVar-119 cluster_TypeVar-119 outer_cluster_TypeVar-93 cluster_TypeVar-93 outer_cluster_TypeVar-201 cluster_TypeVar-201 outer_cluster_TypeVar-221 cluster_TypeVar-221 outer_cluster_TypeVar-224 cluster_TypeVar-224 outer_cluster_TypeVar-205 cluster_TypeVar-205 outer_cluster_TypeVar-220 cluster_TypeVar-220 outer_cluster_TypeVar-95 cluster_TypeVar-95 outer_cluster_TypeVar-92 cluster_TypeVar-92 outer_cluster_TypeVar-133 cluster_TypeVar-133 outer_cluster_TypeVar-175 cluster_TypeVar-175 outer_cluster_TypeVar-108 cluster_TypeVar-108 outer_cluster_TypeVar-173 cluster_TypeVar-173 outer_cluster_TypeVar-160 cluster_TypeVar-160 outer_cluster_TypeVar-152 cluster_TypeVar-152 outer_cluster_TypeVar-161 cluster_TypeVar-161 outer_cluster_TypeVar-203 cluster_TypeVar-203 outer_cluster_TypeVar-197 cluster_TypeVar-197 outer_cluster_TypeVar-104 cluster_TypeVar-104 outer_cluster_TypeVar-172 cluster_TypeVar-172 outer_cluster_TypeVar-222 cluster_TypeVar-222 outer_cluster_TypeVar-147 cluster_TypeVar-147 outer_cluster_TypeVar-216 cluster_TypeVar-216 outer_cluster_TypeVar-146 cluster_TypeVar-146 outer_cluster_TypeVar-169 cluster_TypeVar-169 outer_cluster_TypeVar-196 cluster_TypeVar-196 outer_cluster_TypeVar-198 cluster_TypeVar-198 outer_cluster_TypeVar-148 cluster_TypeVar-148 outer_cluster_TypeVar-89 cluster_TypeVar-89 outer_cluster_TypedIns-96 cluster_TypedIns-96 outer_cluster_TypedOuts-118 cluster_TypedOuts-118 outer_cluster_split-3-primitive-Unit-0 cluster_split-3-primitive-Unit-0 outer_cluster_split-1-primitive-Unit-0 cluster_split-1-primitive-Unit-0 outer_cluster_split-2-primitive-Unit-0 cluster_split-2-primitive-Unit-0 outer_cluster_Unit-0 cluster_Unit-0 outer_cluster_split-0-primitive-Unit-0 cluster_split-0-primitive-Unit-0 outer_cluster_Vec_Port-2 cluster_Vec_Port-2 outer_cluster_Vec_String-0 cluster_Vec_String-0 outer_cluster_Vec_Term-1 cluster_Vec_Term-1 outer_cluster_Vec_Term-3 cluster_Vec_Term-3 outer_cluster_Vec_Term-5 cluster_Vec_Term-5 outer_cluster_Vec_Term-6 cluster_Vec_Term-6 outer_cluster_Vec_Term-4 cluster_Vec_Term-4 outer_cluster_Vec_Term-2 cluster_Vec_Term-2 outer_cluster_Vec_Term-0 cluster_Vec_Term-0 function-0-InPorts___init__:s->primitive-Vec_String-0 function-0-Port___init__:s->function-39-Term_getPort function-39-Term_getPort:s->function-3-Py_AttrIO function-0-PortList___getitem__:s->function-0-PortList___init__ function-0-PortList___init__:s->primitive-Vec_Port-2 function-1-Port___init__:s->function-25-Nb_Mul_Float32 function-25-Nb_Mul_Float32:s->function-6-Nb_Add_Float32 function-25-Nb_Mul_Float32:s->function-7-Nb_Mul_Float32 function-1-PortList___getitem__:s->function-0-PortList___init__ primitive-Vec_Port-2:s->function-0-PortList___getitem__ primitive-Vec_Port-2:s->function-1-PortList___getitem__ function-0-Region___init__:s->function-0-InPorts___init__ function-2-Port_name:s->function-0-PortList___getitem__ function-3-Port_name:s->function-1-PortList___getitem__ function-13-Nb_Mul_Float32:s->function-4-Npy_cast_f64_to_f32 function-13-Nb_Mul_Float32:s->function-11-Nb_Mul_Float32 function-4-Npy_cast_f64_to_f32:s->function-6-Term_LiteralF64 function-11-Nb_Mul_Float32:s->function-9-Term_getPort function-11-Nb_Mul_Float32:s->function-1-Nb_Add_Float32 function-4-Term_getPort:s->function-1-Py_AttrIO function-1-Py_AttrIO:s->function-0-Py_LoadGlobal function-1-Py_AttrIO:s->function-23-Term_getPort function-1-ModuleGetAttr:s->function-0-Module___init__ function-5-Py_MulIO:s->function-32-Term_getPort function-5-Py_MulIO:s->function-1-Npy_sqrt_float32 function-5-Py_MulIO:s->function-10-Term_getPort function-32-Term_getPort:s->function-8-Py_Call function-1-Npy_sqrt_float32:s->function-1-Nb_Div_Float32 function-10-Term_getPort:s->function-3-Py_AddIO function-1-Py_DivIO:s->function-35-Term_getPort function-1-Py_DivIO:s->function-3-Npy_cast_i64_to_f32 function-1-Py_DivIO:s->function-2-Npy_cast_f64_to_f32 function-35-Term_getPort:s->function-10-Py_Call function-3-Npy_cast_i64_to_f32:s->function-1-Term_LiteralI64 function-2-Npy_cast_f64_to_f32:s->function-3-Term_LiteralF64 function-0-Region_get:s->function-0-Region___init__ function-12-Py_Call:s->function-17-Term_getPort function-12-Py_Call:s->function-29-Term_getPort function-12-Py_Call:s->function-3-TermList___init__ function-17-Term_getPort:s->function-3-Py_AttrIO function-29-Term_getPort:s->function-9-Py_Call function-3-TermList___init__:s->primitive-Vec_Term-3 function-4-Nb_Add_Float32:s->function-3-Nb_Add_Float32 function-4-Nb_Add_Float32:s->function-13-Npy_float32 function-3-Nb_Add_Float32:s->function-14-Nb_Mul_Float32 function-3-Nb_Add_Float32:s->function-18-Nb_Mul_Float32 function-13-Npy_float32:s->function-6-Term_LiteralF64 function-14-Nb_Mul_Float32:s->function-14-Npy_float32 function-14-Nb_Mul_Float32:s->function-19-Nb_Mul_Float32 function-14-Npy_float32:s->function-7-Term_LiteralF64 function-19-Nb_Mul_Float32:s->function-11-Nb_Mul_Float32 function-19-Nb_Mul_Float32:s->function-22-Nb_Mul_Float32 function-7-Py_MulIO:s->function-0-Region_get function-7-Py_MulIO:s->function-34-Term_getPort function-7-Py_MulIO:s->function-0-Npy_cast_f64_to_f32 function-34-Term_getPort:s->function-7-Py_MulIO function-0-Npy_cast_f64_to_f32:s->function-0-Term_LiteralF64 function-0-Term_DbgValue:s->function-0-ModuleGetAttr function-0-ModuleGetAttr:s->function-0-Module___init__ function-3-Py_AttrIO:s->function-0-Py_LoadGlobal function-3-Py_AttrIO:s->function-25-Term_getPort function-5-Npy_cast_f64_to_f32:s->function-7-Term_LiteralF64 function-0-Py_AttrIO:s->function-22-Term_getPort function-0-Py_AttrIO:s->function-0-Py_LoadGlobal function-22-Term_getPort:s->function-2-Py_AddIO function-0-Py_LoadGlobal:s->function-26-Term_getPort function-6-Py_MulIO:s->function-33-Term_getPort function-6-Py_MulIO:s->function-1-Npy_cast_f64_to_f32 function-6-Py_MulIO:s->function-1-Nb_Pow_Float32_Int64 function-33-Term_getPort:s->function-12-Py_Call function-1-Npy_cast_f64_to_f32:s->function-1-Term_LiteralF64 function-1-Nb_Pow_Float32_Int64:s->function-0-Region_get function-1-Nb_Pow_Float32_Int64:s->function-2-Term_LiteralI64 function-6-Term_getPort:s->function-8-Py_Call function-8-Py_Call:s->function-17-Term_getPort function-8-Py_Call:s->function-25-Term_getPort function-8-Py_Call:s->function-5-TermList___init__ function-2-Npy_float32:s->function-1-Term_LiteralF64 function-3-Nb_Pow_Float32_Int64:s->function-0-Region_get function-3-Nb_Pow_Float32_Int64:s->function-0-Term_LiteralI64 function-24-Nb_Mul_Float32:s->function-0-Region_get function-24-Nb_Mul_Float32:s->function-6-Npy_cast_f64_to_f32 function-6-Npy_cast_f64_to_f32:s->function-4-Term_LiteralF64 function-3-Term_getPort:s->function-12-Py_Call function-5-Npy_float32:s->function-1-Term_getPort function-1-Term_getPort:s->function-2-Py_AttrIO function-22-Nb_Mul_Float32:s->function-11-Nb_Mul_Float32 function-22-Nb_Mul_Float32:s->function-15-Npy_float32 function-8-Nb_Pow_Float32_Int64:s->function-1-Term_LiteralI64 function-8-Nb_Pow_Float32_Int64:s->function-12-Term_getPort function-12-Term_getPort:s->function-5-Py_MulIO function-13-Term_getPort:s->function-10-Py_Call function-10-Py_Call:s->function-0-ModuleGetAttr function-10-Py_Call:s->function-27-Term_getPort function-10-Py_Call:s->function-1-TermList___init__ function-3-Npy_float32:s->function-0-Term_LiteralI64 function-2-Npy_cast_i64_to_f32:s->function-0-Term_LiteralI64 function-20-Nb_Mul_Float32:s->function-19-Nb_Mul_Float32 function-20-Nb_Mul_Float32:s->function-12-Term_getPort function-6-Nb_Pow_Float32_Int64:s->function-12-Term_getPort function-6-Nb_Pow_Float32_Int64:s->function-2-Term_LiteralI64 function-0-Term_getPort:s->function-13-Py_Call function-13-Py_Call:s->function-0-ModuleGetAttr function-13-Py_Call:s->function-30-Term_getPort function-13-Py_Call:s->function-0-TermList___init__ function-1-Npy_float32:s->function-0-Term_LiteralF64 function-12-Nb_Pow_Float32_Int64:s->function-0-Region_get function-12-Nb_Pow_Float32_Int64:s->function-3-Term_LiteralI64 function-11-Nb_Pow_Float32_Int64:s->function-12-Term_getPort function-11-Nb_Pow_Float32_Int64:s->function-3-Term_LiteralI64 function-15-Npy_float32:s->function-4-Term_LiteralF64 function-7-Term_getPort:s->function-0-Py_PowIO function-0-Py_PowIO:s->function-0-Region_get function-0-Py_PowIO:s->function-2-Term_LiteralI64 function-0-Py_PowIO:s->function-38-Term_getPort function-1-Nb_Mul_Float32:s->function-0-Region_get function-1-Nb_Mul_Float32:s->function-3-Nb_Mul_Float32 function-3-Nb_Mul_Float32:s->function-0-Region_get function-3-Nb_Mul_Float32:s->function-24-Nb_Mul_Float32 function-14-Term_getPort:s->function-7-Py_Call function-7-Py_Call:s->function-11-Term_getPort function-7-Py_Call:s->function-24-Term_getPort function-7-Py_Call:s->function-6-TermList___init__ function-3-Nb_Div_Float32:s->function-4-Nb_Add_Float32 function-3-Nb_Div_Float32:s->function-2-Nb_Add_Float32 function-2-Nb_Add_Float32:s->function-13-Nb_Mul_Float32 function-2-Nb_Add_Float32:s->function-12-Nb_Mul_Float32 function-0-Npy_tanh:s->function-12-Term_getPort function-1-Npy_tanh_float32:s->function-11-Nb_Mul_Float32 function-9-Py_Call:s->function-4-Term_getPort function-9-Py_Call:s->function-26-Term_getPort function-9-Py_Call:s->function-4-TermList___init__ function-26-Term_getPort:s->function-3-Py_AddIO function-4-TermList___init__:s->primitive-Vec_Term-4 function-16-Term_getPort:s->function-2-Py_AddIO function-2-Py_AddIO:s->function-2-Npy_cast_i64_to_f32 function-2-Py_AddIO:s->function-1-Npy_tanh_float32 function-2-Py_AddIO:s->function-36-Term_getPort function-6-Nb_Add_Float32:s->function-13-Term_getPort function-6-Nb_Add_Float32:s->function-1-Npy_tanh_float32 function-11-Term_getPort:s->function-0-Py_AttrIO function-2-ModuleGetAttr:s->function-0-Module___init__ function-18-Nb_Mul_Float32:s->function-12-Term_getPort function-18-Nb_Mul_Float32:s->function-6-Nb_Pow_Float32_Int64 function-23-Term_getPort:s->function-7-Py_Call function-20-Term_getPort:s->function-0-Term_RegionEnd function-0-Term_RegionEnd:s->function-0-PortList___init__ function-0-Term_RegionEnd:s->function-0-Region___init__ function-11-Py_Call:s->function-0-Term_DbgValue function-11-Py_Call:s->function-28-Term_getPort function-11-Py_Call:s->function-2-TermList___init__ function-28-Term_getPort:s->function-0-Py_PowIO function-2-TermList___init__:s->primitive-Vec_Term-2 function-2-Py_AttrIO:s->function-0-Py_LoadGlobal function-2-Py_AttrIO:s->function-24-Term_getPort function-24-Term_getPort:s->function-0-Py_AttrIO function-9-Term_getPort:s->function-9-Py_Call function-0-Npy_sqrt:s->function-5-Term_getPort function-5-Term_getPort:s->function-1-Py_DivIO function-1-Nb_Div_Float32:s->function-3-Term_getPort function-1-Nb_Div_Float32:s->function-2-Term_getPort function-36-Term_getPort:s->function-2-Py_AttrIO function-0-GraphRoot:s->function-0-Term_Func function-0-Term_Func:s->function-0-Term_RegionEnd function-3-ModuleGetAttr:s->function-0-Module___init__ function-2-Term_getPort:s->function-11-Py_Call function-4-Npy_float32:s->function-1-Term_LiteralI64 function-15-Term_getPort:s->function-7-Py_MulIO function-7-Nb_Mul_Float32:s->function-0-Region_get function-7-Nb_Mul_Float32:s->function-0-Term_getPort function-3-Region_get:s->function-0-Region___init__ function-18-Term_getPort:s->function-4-Py_MulIO function-4-Py_MulIO:s->function-16-Term_getPort function-4-Py_MulIO:s->function-15-Term_getPort function-4-Py_MulIO:s->function-31-Term_getPort function-25-Term_getPort:s->function-5-Py_MulIO function-3-Py_AddIO:s->function-0-Region_get function-3-Py_AddIO:s->function-37-Term_getPort function-3-Py_AddIO:s->function-8-Term_getPort function-27-Term_getPort:s->function-6-Py_MulIO function-30-Term_getPort:s->function-1-Py_AttrIO function-31-Term_getPort:s->function-1-Py_DivIO function-37-Term_getPort:s->function-11-Py_Call function-38-Term_getPort:s->function-13-Py_Call function-0-Port_value:s->function-0-Port___init__ function-0-PortList_getValue:s->function-0-PortList___init__ function-6-TermList___init__:s->primitive-Vec_Term-6 function-2-Nb_Pow_Float32_Int64:s->function-1-Term_LiteralI64 function-2-Nb_Pow_Float32_Int64:s->function-0-Region_get function-0-TermList___init__:s->primitive-Vec_Term-0 function-12-Nb_Mul_Float32:s->function-20-Nb_Mul_Float32 function-12-Nb_Mul_Float32:s->function-12-Npy_float32 function-12-Npy_float32:s->function-5-Term_LiteralF64 function-1-Nb_Add_Float32:s->function-0-Region_get function-1-Nb_Add_Float32:s->function-8-Term_getPort function-5-TermList___init__:s->primitive-Vec_Term-5 function-9-Nb_Pow_Float32_Int64:s->function-11-Nb_Mul_Float32 function-9-Nb_Pow_Float32_Int64:s->function-0-Term_LiteralI64 function-1-Term_DbgValue:s->function-1-Port_value function-1-Port_value:s->function-1-Port___init__ function-19-Term_getPort:s->function-4-Py_MulIO function-1-PortList_getValue:s->function-0-PortList___init__ function-8-Term_getPort:s->function-6-Py_MulIO function-7-Nb_Pow_Float32_Int64:s->function-11-Nb_Mul_Float32 function-7-Nb_Pow_Float32_Int64:s->function-4-Term_LiteralI64 function-21-Term_getPort:s->function-0-Term_RegionEnd function-1-TermList___init__:s->primitive-Vec_Term-1 function-8-Nb_Mul_Float32:s->function-1-Nb_Pow_Float32_Int64 function-8-Nb_Mul_Float32:s->function-6-Term_getPort function-3-Npy_cast_f64_to_f32:s->function-5-Term_LiteralF64 primitive-Vec_Term-4:s->function-5-Term_getPort primitive-Vec_Term-2:s->function-1-Term_LiteralI64 primitive-Vec_Term-5:s->function-1-Term_LiteralF64 primitive-Vec_Term-6:s->function-12-Term_getPort primitive-Vec_Term-3:s->function-3-ModuleGetAttr primitive-Vec_Term-0:s->function-0-Term_LiteralF64 primitive-Vec_Term-1:s->function-0-Term_LiteralI64 function-9-TypeVar_getType:s->function-8-TypeVar___init__ function-8-TypeVar___init__:s->function-0-Py_LoadGlobal function-0-Module_toType:s->function-0-Module___init__ function-10-TypeVar_getType:s->function-1-TypedIns_arg function-1-TypedIns_arg:s->function-0-TypedIns___init__ function-12-TypeVar_getType:s->function-14-TypeVar___init__ function-14-TypeVar___init__:s->function-7-Term_getPort function-13-TypeVar_getType:s->function-16-TypeVar___init__ function-16-TypeVar___init__:s->function-0-Term_getPort function-14-TypeVar_getType:s->function-17-TypeVar___init__ function-17-TypeVar___init__:s->function-6-Term_getPort function-15-TypeVar_getType:s->function-18-TypeVar___init__ function-18-TypeVar___init__:s->function-13-Term_getPort function-16-TypeVar_getType:s->function-19-TypeVar___init__ function-19-TypeVar___init__:s->function-2-Term_getPort function-17-TypeVar_getType:s->function-20-TypeVar___init__ function-20-TypeVar___init__:s->function-2-Nb_Pow_Float32_Int64 function-18-TypeVar_getType:s->function-21-TypeVar___init__ function-21-TypeVar___init__:s->function-3-Term_getPort function-19-TypeVar_getType:s->function-22-TypeVar___init__ function-22-TypeVar___init__:s->function-3-Nb_Pow_Float32_Int64 function-21-TypeVar_getType:s->function-24-TypeVar___init__ function-24-TypeVar___init__:s->function-7-Nb_Mul_Float32 function-22-TypeVar_getType:s->function-25-TypeVar___init__ function-25-TypeVar___init__:s->function-8-Nb_Mul_Float32 function-25-TypeVar_getType:s->function-28-TypeVar___init__ function-28-TypeVar___init__:s->function-1-Nb_Div_Float32 function-26-TypeVar_getType:s->function-29-TypeVar___init__ function-29-TypeVar___init__:s->function-1-Nb_Add_Float32 function-27-TypeVar_getType:s->function-30-TypeVar___init__ function-30-TypeVar___init__:s->function-9-Term_getPort function-28-TypeVar_getType:s->function-31-TypeVar___init__ function-31-TypeVar___init__:s->function-11-Nb_Mul_Float32 function-29-TypeVar_getType:s->function-32-TypeVar___init__ function-32-TypeVar___init__:s->function-3-Nb_Div_Float32 function-34-TypeVar_getType:s->function-37-TypeVar___init__ function-37-TypeVar___init__:s->function-2-Nb_Add_Float32 function-35-TypeVar_getType:s->function-38-TypeVar___init__ function-38-TypeVar___init__:s->function-3-Nb_Add_Float32 function-36-TypeVar_getType:s->function-39-TypeVar___init__ function-39-TypeVar___init__:s->function-4-Nb_Add_Float32 function-37-TypeVar_getType:s->function-40-TypeVar___init__ function-40-TypeVar___init__:s->function-12-Nb_Mul_Float32 function-38-TypeVar_getType:s->function-41-TypeVar___init__ function-41-TypeVar___init__:s->function-13-Nb_Mul_Float32 function-39-TypeVar_getType:s->function-42-TypeVar___init__ function-42-TypeVar___init__:s->function-14-Nb_Mul_Float32 function-40-TypeVar_getType:s->function-43-TypeVar___init__ function-43-TypeVar___init__:s->function-6-Nb_Pow_Float32_Int64 function-41-TypeVar_getType:s->function-44-TypeVar___init__ function-44-TypeVar___init__:s->function-7-Nb_Pow_Float32_Int64 function-42-TypeVar_getType:s->function-45-TypeVar___init__ function-45-TypeVar___init__:s->function-8-Nb_Pow_Float32_Int64 function-43-TypeVar_getType:s->function-46-TypeVar___init__ function-46-TypeVar___init__:s->function-6-Nb_Add_Float32 function-44-TypeVar_getType:s->function-47-TypeVar___init__ function-47-TypeVar___init__:s->function-9-Nb_Pow_Float32_Int64 function-45-TypeVar_getType:s->function-48-TypeVar___init__ function-48-TypeVar___init__:s->function-3-Npy_cast_f64_to_f32 function-46-TypeVar_getType:s->function-49-TypeVar___init__ function-49-TypeVar___init__:s->function-4-Npy_cast_f64_to_f32 function-47-TypeVar_getType:s->function-50-TypeVar___init__ function-50-TypeVar___init__:s->function-5-Npy_cast_f64_to_f32 function-48-TypeVar_getType:s->function-51-TypeVar___init__ function-51-TypeVar___init__:s->function-6-Npy_cast_f64_to_f32 function-49-TypeVar_getType:s->function-15-TypeVar___init__ function-15-TypeVar___init__:s->function-1-Port_value function-0-TypeVar_getType:s->function-0-TypeVar___init__ function-0-TypeVar___init__:s->function-0-Term_LiteralI64 function-1-TypeVar_getType:s->function-1-TypeVar___init__ function-1-TypeVar___init__:s->function-1-Term_LiteralI64 function-2-TypeVar_getType:s->function-2-TypeVar___init__ function-2-TypeVar___init__:s->function-2-Term_LiteralI64 function-20-TypeVar_getType:s->function-23-TypeVar___init__ function-23-TypeVar___init__:s->function-3-Term_LiteralI64 function-30-TypeVar_getType:s->function-33-TypeVar___init__ function-33-TypeVar___init__:s->function-4-Term_LiteralI64 function-3-TypeVar_getType:s->function-3-TypeVar___init__ function-3-TypeVar___init__:s->function-0-Term_LiteralF64 function-4-TypeVar_getType:s->function-4-TypeVar___init__ function-4-TypeVar___init__:s->function-1-Term_LiteralF64 function-11-TypeVar_getType:s->function-12-TypeVar___init__ function-12-TypeVar___init__:s->function-3-Term_LiteralF64 function-24-TypeVar_getType:s->function-27-TypeVar___init__ function-27-TypeVar___init__:s->function-4-Term_LiteralF64 function-31-TypeVar_getType:s->function-34-TypeVar___init__ function-34-TypeVar___init__:s->function-5-Term_LiteralF64 function-32-TypeVar_getType:s->function-35-TypeVar___init__ function-35-TypeVar___init__:s->function-6-Term_LiteralF64 function-33-TypeVar_getType:s->function-36-TypeVar___init__ function-36-TypeVar___init__:s->function-7-Term_LiteralF64 function-11-TypeVar___init__:s->function-3-Region_get function-0-TypedOuts_at:s->function-0-TypedOuts___init__ function-0-TypedOuts___init__:s->function-0-Region___init__ function-1-TypedOuts_at:s->function-0-TypedOuts___init__ function-9-TypeVar___init__:s->function-0-Region_get function-0-TypedIns___init__:s->function-0-Region___init__ function-0-IsConstantFalse:s->function-3-Term_LiteralI64 function-2-IsConstantTrue:s->function-2-Term_LiteralI64 function-3-IsConstantTrue:s->function-4-Term_LiteralI64 function-0-IsConstantTrue:s->function-0-Term_LiteralI64 function-1-IsConstantTrue:s->function-1-Term_LiteralI64 function-0-ErrorMsg_root ErrorMsg.root function-0-InPorts___init__ InPorts primitive-Vec_String-0 Vec("!io", "a") function-0-Module___init__ Module("numpy") function-0-Port___init__ Port("!io", ·) function-39-Term_getPort ·.getPort(·, 0) function-0-PortList___getitem__ ·[0] function-0-PortList___init__ PortList function-1-Port___init__ Port("!ret", ·) function-25-Nb_Mul_Float32 Nb_Mul_Float32 function-1-PortList___getitem__ ·[1] primitive-Vec_Port-2 Vec function-0-Region___init__ Region("804", ·) function-2-Port_name ·.name primitive-String-10 "!io" function-3-Port_name ·.name primitive-String-2684354614 "!ret" function-1-Term_LiteralI64 Term.LiteralI64(2) function-13-Nb_Mul_Float32 Nb_Mul_Float32 function-4-Npy_cast_f64_to_f32 Npy_cast_f64_to_f32 function-11-Nb_Mul_Float32 Nb_Mul_Float32 function-4-Term_getPort ·.getPort(·, 1) function-1-Py_AttrIO Py_AttrIO(·, ·, "sqrt") function-1-ModuleGetAttr ModuleGetAttr(·, "sqrt") function-5-Py_MulIO Py_MulIO function-32-Term_getPort ·.getPort(·, 0) function-1-Npy_sqrt_float32 Npy_sqrt_float32 function-10-Term_getPort ·.getPort(·, 1) function-4-Term_LiteralF64 Term.LiteralF64(1.0) function-1-Py_DivIO Py_DivIO function-35-Term_getPort ·.getPort(·, 0) function-3-Npy_cast_i64_to_f32 Npy_cast_i64_to_f32 function-2-Npy_cast_f64_to_f32 Npy_cast_f64_to_f32 function-0-Region_get ·.get(·, 1) function-12-Py_Call Py_Call function-17-Term_getPort ·.getPort(·, 1) function-29-Term_getPort ·.getPort(·, 0) function-3-TermList___init__ TermList function-4-Nb_Add_Float32 Nb_Add_Float32 function-3-Nb_Add_Float32 Nb_Add_Float32 function-13-Npy_float32 Npy_float32 function-14-Nb_Mul_Float32 Nb_Mul_Float32 function-14-Npy_float32 Npy_float32 function-19-Nb_Mul_Float32 Nb_Mul_Float32 function-7-Py_MulIO Py_MulIO function-34-Term_getPort ·.getPort(·, 0) function-0-Npy_cast_f64_to_f32 Npy_cast_f64_to_f32 function-0-Term_DbgValue Term.DbgValue("dt", ·) function-0-ModuleGetAttr ModuleGetAttr(·, "float32") function-3-Py_AttrIO Py_AttrIO(·, ·, "float32") function-7-Term_LiteralF64 Term.LiteralF64(45.0) function-5-Npy_cast_f64_to_f32 Npy_cast_f64_to_f32 function-6-Term_LiteralF64 Term.LiteralF64(105.0) function-0-Py_AttrIO Py_AttrIO(·, ·, "tanh") function-22-Term_getPort ·.getPort(·, 0) function-0-Py_LoadGlobal Py_LoadGlobal(·, "np") function-6-Py_MulIO Py_MulIO function-33-Term_getPort ·.getPort(·, 0) function-1-Npy_cast_f64_to_f32 Npy_cast_f64_to_f32 function-1-Nb_Pow_Float32_Int64 Nb_Pow_Float32_Int64 function-6-Term_getPort ·.getPort(·, 1) function-8-Py_Call Py_Call function-2-Npy_float32 Npy_float32 function-1-Term_LiteralF64 Term.LiteralF64(0.044715) function-3-Nb_Pow_Float32_Int64 Nb_Pow_Float32_Int64 function-0-Term_LiteralI64 Term.LiteralI64(1) function-24-Nb_Mul_Float32 Nb_Mul_Float32 function-6-Npy_cast_f64_to_f32 Npy_cast_f64_to_f32 function-3-Term_getPort ·.getPort(·, 1) function-5-Npy_float32 Npy_float32 function-1-Term_getPort ·.getPort(·, 1) function-3-Term_LiteralF64 Term.LiteralF64(3.141592653589793) function-22-Nb_Mul_Float32 Nb_Mul_Float32 function-8-Nb_Pow_Float32_Int64 Nb_Pow_Float32_Int64 function-12-Term_getPort ·.getPort(·, 1) function-13-Term_getPort ·.getPort(·, 1) function-10-Py_Call Py_Call function-3-Npy_float32 Npy_float32 function-2-Npy_cast_i64_to_f32 Npy_cast_i64_to_f32 function-20-Nb_Mul_Float32 Nb_Mul_Float32 function-6-Nb_Pow_Float32_Int64 Nb_Pow_Float32_Int64 function-2-Term_LiteralI64 Term.LiteralI64(3) function-5-Term_LiteralF64 Term.LiteralF64(10.0) function-0-Term_getPort ·.getPort(·, 1) function-13-Py_Call Py_Call function-1-Npy_float32 Npy_float32 function-0-Term_LiteralF64 Term.LiteralF64(0.5) function-12-Nb_Pow_Float32_Int64 Nb_Pow_Float32_Int64 function-3-Term_LiteralI64 Term.LiteralI64(0) function-11-Nb_Pow_Float32_Int64 Nb_Pow_Float32_Int64 function-15-Npy_float32 Npy_float32 function-7-Term_getPort ·.getPort(·, 1) function-0-Py_PowIO Py_PowIO function-1-Nb_Mul_Float32 Nb_Mul_Float32 function-3-Nb_Mul_Float32 Nb_Mul_Float32 function-14-Term_getPort ·.getPort(·, 1) function-7-Py_Call Py_Call function-3-Nb_Div_Float32 Nb_Div_Float32 function-2-Nb_Add_Float32 Nb_Add_Float32 function-0-Npy_tanh Npy_tanh function-1-Npy_tanh_float32 Npy_tanh_float32 function-9-Py_Call Py_Call function-26-Term_getPort ·.getPort(·, 0) function-4-TermList___init__ TermList function-16-Term_getPort ·.getPort(·, 1) function-2-Py_AddIO Py_AddIO function-6-Nb_Add_Float32 Nb_Add_Float32 function-11-Term_getPort ·.getPort(·, 1) function-2-ModuleGetAttr ModuleGetAttr(·, "tanh") function-18-Nb_Mul_Float32 Nb_Mul_Float32 function-23-Term_getPort ·.getPort(·, 0) function-20-Term_getPort ·.getPort(·, 0) function-0-Term_RegionEnd Term.RegionEnd function-11-Py_Call Py_Call function-28-Term_getPort ·.getPort(·, 0) function-2-TermList___init__ TermList function-2-Py_AttrIO Py_AttrIO(·, ·, "pi") function-24-Term_getPort ·.getPort(·, 0) function-9-Term_getPort ·.getPort(·, 1) function-0-Npy_sqrt Npy_sqrt function-5-Term_getPort ·.getPort(·, 1) function-1-Nb_Div_Float32 Nb_Div_Float32 function-36-Term_getPort ·.getPort(·, 0) function-0-GraphRoot GraphRoot function-0-Term_Func Term.Func("1274", "transformed_gelu_tanh_forward", ·) function-3-ModuleGetAttr ModuleGetAttr(·, "pi") function-2-Term_getPort ·.getPort(·, 1) function-4-Npy_float32 Npy_float32 function-15-Term_getPort ·.getPort(·, 1) function-7-Nb_Mul_Float32 Nb_Mul_Float32 function-3-Region_get ·.get(·, 0) function-18-Term_getPort ·.getPort(·, 0) function-4-Py_MulIO Py_MulIO function-25-Term_getPort ·.getPort(·, 0) function-3-Py_AddIO Py_AddIO function-27-Term_getPort ·.getPort(·, 0) function-30-Term_getPort ·.getPort(·, 0) function-31-Term_getPort ·.getPort(·, 0) function-37-Term_getPort ·.getPort(·, 0) function-38-Term_getPort ·.getPort(·, 0) function-0-Port_value ·.value function-0-PortList_getValue ·.getValue(·, 0) function-6-TermList___init__ TermList function-2-Nb_Pow_Float32_Int64 Nb_Pow_Float32_Int64 function-0-TermList___init__ TermList function-12-Nb_Mul_Float32 Nb_Mul_Float32 function-12-Npy_float32 Npy_float32 function-1-Nb_Add_Float32 Nb_Add_Float32 function-5-TermList___init__ TermList function-9-Nb_Pow_Float32_Int64 Nb_Pow_Float32_Int64 function-1-Term_DbgValue Term.DbgValue("result", ·) function-1-Port_value ·.value function-19-Term_getPort ·.getPort(·, 1) function-1-PortList_getValue ·.getValue(·, 1) function-8-Term_getPort ·.getPort(·, 1) function-7-Nb_Pow_Float32_Int64 Nb_Pow_Float32_Int64 function-4-Term_LiteralI64 Term.LiteralI64(4) function-21-Term_getPort ·.getPort(·, 1) function-1-TermList___init__ TermList function-8-Nb_Mul_Float32 Nb_Mul_Float32 function-3-Npy_cast_f64_to_f32 Npy_cast_f64_to_f32 primitive-Vec_Term-4 Vec primitive-Vec_Term-2 Vec primitive-Vec_Term-5 Vec primitive-Vec_Term-6 Vec primitive-Vec_Term-3 Vec primitive-Vec_Term-0 Vec primitive-Vec_Term-1 Vec function-9-TypeVar_getType ·.getType function-8-TypeVar___init__ TypeVar function-0-Module_toType ·.toType function-10-TypeVar_getType ·.getType function-1-TypedIns_arg ·.arg(·, 1) function-12-TypeVar_getType ·.getType function-14-TypeVar___init__ TypeVar function-13-TypeVar_getType ·.getType function-16-TypeVar___init__ TypeVar function-14-TypeVar_getType ·.getType function-17-TypeVar___init__ TypeVar function-15-TypeVar_getType ·.getType function-18-TypeVar___init__ TypeVar function-16-TypeVar_getType ·.getType function-19-TypeVar___init__ TypeVar function-17-TypeVar_getType ·.getType function-20-TypeVar___init__ TypeVar function-18-TypeVar_getType ·.getType function-21-TypeVar___init__ TypeVar function-19-TypeVar_getType ·.getType function-22-TypeVar___init__ TypeVar function-21-TypeVar_getType ·.getType function-24-TypeVar___init__ TypeVar function-22-TypeVar_getType ·.getType function-25-TypeVar___init__ TypeVar function-25-TypeVar_getType ·.getType function-28-TypeVar___init__ TypeVar function-26-TypeVar_getType ·.getType function-29-TypeVar___init__ TypeVar function-27-TypeVar_getType ·.getType function-30-TypeVar___init__ TypeVar function-28-TypeVar_getType ·.getType function-31-TypeVar___init__ TypeVar function-29-TypeVar_getType ·.getType function-32-TypeVar___init__ TypeVar function-34-TypeVar_getType ·.getType function-37-TypeVar___init__ TypeVar function-35-TypeVar_getType ·.getType function-38-TypeVar___init__ TypeVar function-36-TypeVar_getType ·.getType function-39-TypeVar___init__ TypeVar function-37-TypeVar_getType ·.getType function-40-TypeVar___init__ TypeVar function-38-TypeVar_getType ·.getType function-41-TypeVar___init__ TypeVar function-39-TypeVar_getType ·.getType function-42-TypeVar___init__ TypeVar function-40-TypeVar_getType ·.getType function-43-TypeVar___init__ TypeVar function-41-TypeVar_getType ·.getType function-44-TypeVar___init__ TypeVar function-42-TypeVar_getType ·.getType function-45-TypeVar___init__ TypeVar function-43-TypeVar_getType ·.getType function-46-TypeVar___init__ TypeVar function-44-TypeVar_getType ·.getType function-47-TypeVar___init__ TypeVar function-45-TypeVar_getType ·.getType function-48-TypeVar___init__ TypeVar function-46-TypeVar_getType ·.getType function-49-TypeVar___init__ TypeVar function-47-TypeVar_getType ·.getType function-50-TypeVar___init__ TypeVar function-48-TypeVar_getType ·.getType function-51-TypeVar___init__ TypeVar function-49-TypeVar_getType ·.getType function-15-TypeVar___init__ TypeVar function-2-Type_simple Type.simple("Float32") function-0-TypeVar_getType ·.getType function-0-TypeVar___init__ TypeVar function-1-TypeVar_getType ·.getType function-1-TypeVar___init__ TypeVar function-2-TypeVar_getType ·.getType function-2-TypeVar___init__ TypeVar function-20-TypeVar_getType ·.getType function-23-TypeVar___init__ TypeVar function-30-TypeVar_getType ·.getType function-33-TypeVar___init__ TypeVar function-0-Type_simple Type.simple("Int64") function-3-TypeVar_getType ·.getType function-3-TypeVar___init__ TypeVar function-4-TypeVar_getType ·.getType function-4-TypeVar___init__ TypeVar function-11-TypeVar_getType ·.getType function-12-TypeVar___init__ TypeVar function-24-TypeVar_getType ·.getType function-27-TypeVar___init__ TypeVar function-31-TypeVar_getType ·.getType function-34-TypeVar___init__ TypeVar function-32-TypeVar_getType ·.getType function-35-TypeVar___init__ TypeVar function-33-TypeVar_getType ·.getType function-36-TypeVar___init__ TypeVar function-1-Type_simple Type.simple("Float64") function-11-TypeVar___init__ TypeVar function-0-TypedOuts_at ·.at(·, 0) function-0-TypedOuts___init__ TypedOuts function-1-TypedOuts_at ·.at(·, 1) function-9-TypeVar___init__ TypeVar function-0-TypedIns___init__ TypedIns function-0-IsConstantFalse IsConstantFalse split-3-primitive-Unit-0 () function-2-IsConstantTrue IsConstantTrue split-1-primitive-Unit-0 () function-3-IsConstantTrue IsConstantTrue split-2-primitive-Unit-0 () function-0-IsConstantTrue IsConstantTrue primitive-Unit-0 () function-1-IsConstantTrue IsConstantTrue split-0-primitive-Unit-0 ()
[debug] egglog.extract ▶
_Region_1 = Region("804", InPorts(Vec[String]("!io", "a")))
GraphRoot(
    Term.Func(
        "1274",
        "transformed_gelu_tanh_forward",
        Term.RegionEnd(
            _Region_1,
            PortList(
                Vec[Port](
                    Port("!io", _Region_1.get(0)),
                    Port(
                        "!ret",
                        Nb_Mul_Float32(
                            Nb_Mul_Float32(Npy_cast_f64_to_f32(Term.LiteralF64(0.5)), _Region_1.get(1)),
                            Nb_Add_Float32(
                                Npy_cast_i64_to_f32(Term.LiteralI64(1)),
                                Npy_tanh_float32(
                                    Nb_Mul_Float32(
                                        Npy_sqrt_float32(Nb_Div_Float32(Npy_cast_i64_to_f32(Term.LiteralI64(2)), Npy_cast_f64_to_f32(Term.LiteralF64(3.141592653589793)))),
                                        Nb_Add_Float32(
                                            _Region_1.get(1),
                                            Nb_Mul_Float32(Npy_cast_f64_to_f32(Term.LiteralF64(0.044715)), Nb_Pow_Float32_Int64(_Region_1.get(1), Term.LiteralI64(3))),
                                        ),
                                    )
                                ),
                            ),
                        ),
                    ),
                )
            ),
        ),
    )
)
[metadata] ▶
time elapsed 269.30ms
timing breakdown:
  46.46ms: [debug] initial egraph
  194.40ms: [debug] saturated egraph
  28.44ms: [debug] egglog.extract
4. EGraph Extraction (20.04ms) ▶
EGraph Extraction
Extracted RVSDG ▶
transformed_gelu_tanh_forward = Func (Args (ArgSpec 'a' (PyNone)))
$0 = Region[1575] <- !io a; #attrs (_, Float32)->(_, Float32)
{
  $1 = PyFloat 0.5
  $2 = NbOp_F64_to_F32 $1
  $3 = NbOp_Mul_Float32 $2 $0[1]
  $4 = PyInt 1
  $5 = NbOp_I64_to_F32 $4
  $6 = PyFloat 10.0
  $7 = NbOp_F64_to_F32 $6
  $8 = PyInt 2
  $9 = NbOp_I64_to_F32 $8
  $10 = PyFloat 3.141592653589793
  $11 = NbOp_F64_to_F32 $10
  $12 = NbOp_Div_Float32 $9 $11
  $13 = NpyOp_Sqrt_Float32 $12
  $14 = PyFloat 0.044715
  $15 = NbOp_F64_to_F32 $14
  $16 = PyFloat 1.0
  $17 = NbOp_F64_to_F32 $16
  $18 = NbOp_Mul_Float32 $0[1] $17
  $19 = NbOp_Mul_Float32 $0[1] $18
  $20 = NbOp_Mul_Float32 $0[1] $19
  $21 = NbOp_Mul_Float32 $15 $20
  $22 = NbOp_Add_Float32 $0[1] $21
  $23 = NbOp_Mul_Float32 $13 $22
  $24 = NbOp_Mul_Float32 $23 $17
  $25 = NbOp_Mul_Float32 $23 $24
  $26 = NbOp_Mul_Float32 $23 $25
  $27 = NbOp_Mul_Float32 $7 $26
  $28 = PyFloat 105.0
  $29 = NbOp_F64_to_F32 $28
  $30 = NbOp_Mul_Float32 $29 $23
  $31 = NbOp_Add_Float32 $27 $30
  $32 = NbOp_Mul_Float32 $23 $26
  $33 = PyFloat 45.0
  $34 = NbOp_F64_to_F32 $33
  $35 = NbOp_Mul_Float32 $34 $25
  $36 = NbOp_Add_Float32 $32 $35
  $37 = NbOp_Add_Float32 $36 $29
  $38 = NbOp_Div_Float32 $31 $37
  $39 = NbOp_Add_Float32 $5 $38
  $40 = NbOp_Mul_Float32 $3 $39
} [1782] -> !io=$0[0] !ret=$40
Extracted cost ▶
14747.0
[metadata] ▶
time elapsed 20.04ms
timing breakdown:
  20.03ms: Extracted RVSDG     
  0.01ms: Extracted cost      
5. Backend (3.23ms) ▶
Backend
Lowered module ▶
module {
  func.func @func(%arg0: f32) -> f32 attributes {llvm.emit_c_interface} {
    %cst = arith.constant 5.000000e-01 : f64
    %c1_i64 = arith.constant 1 : i64
    %cst_0 = arith.constant 1.000000e+01 : f64
    %c2_i64 = arith.constant 2 : i64
    %cst_1 = arith.constant 3.1415926535897931 : f64
    %cst_2 = arith.constant 4.471500e-02 : f64
    %cst_3 = arith.constant 1.000000e+00 : f64
    %cst_4 = arith.constant 1.050000e+02 : f64
    %cst_5 = arith.constant 4.500000e+01 : f64
    cf.br ^bb1
  ^bb1:  // pred: ^bb0
    %c0_i32 = arith.constant 0 : i32
    %0 = arith.truncf %cst : f64 to f32
    %1 = arith.mulf %0, %arg0 : f32
    %2 = arith.sitofp %c1_i64 : i64 to f32
    %3 = arith.truncf %cst_0 : f64 to f32
    %4 = arith.sitofp %c2_i64 : i64 to f32
    %5 = arith.truncf %cst_1 : f64 to f32
    %6 = arith.divf %4, %5 : f32
    %7 = math.sqrt %6 : f32
    %8 = arith.truncf %cst_2 : f64 to f32
    %9 = arith.truncf %cst_3 : f64 to f32
    %10 = arith.mulf %arg0, %9 : f32
    %11 = arith.mulf %arg0, %10 : f32
    %12 = arith.mulf %arg0, %11 : f32
    %13 = arith.mulf %8, %12 : f32
    %14 = arith.addf %arg0, %13 : f32
    %15 = arith.mulf %7, %14 : f32
    %16 = arith.mulf %15, %9 : f32
    %17 = arith.mulf %15, %16 : f32
    %18 = arith.mulf %15, %17 : f32
    %19 = arith.mulf %3, %18 : f32
    %20 = arith.truncf %cst_4 : f64 to f32
    %21 = arith.mulf %20, %15 : f32
    %22 = arith.addf %19, %21 : f32
    %23 = arith.mulf %15, %18 : f32
    %24 = arith.truncf %cst_5 : f64 to f32
    %25 = arith.mulf %24, %17 : f32
    %26 = arith.addf %23, %25 : f32
    %27 = arith.addf %26, %20 : f32
    %28 = arith.divf %22, %27 : f32
    %29 = arith.addf %2, %28 : f32
    %30 = arith.mulf %1, %29 : f32
    return %30 : f32
  }
}
[metadata] ▶
time elapsed 3.23ms
timing breakdown:
  3.23ms: Lowered module      
6. MLIR passes (6.50ms) ▶
MLIR passes
MLIR optimized ▶
#map = affine_map<(d0)[s0, s1] -> ((d0 - s0) ceildiv s1)>
#map1 = affine_map<(d0)[s0, s1] -> (d0 * s0 + s1)>
module attributes {gpu.container_module} {
  llvm.func @sqrtf(f32) -> f32 attributes {memory = #llvm.memory_effects<other = none, argMem = none, inaccessibleMem = none>, sym_visibility = "private"}
  llvm.func @func(%arg0: f32) -> f32 attributes {llvm.emit_c_interface} {
    %0 = llvm.mlir.constant(4.500000e+01 : f32) : f32
    %1 = llvm.mlir.constant(1.050000e+02 : f32) : f32
    %2 = llvm.mlir.constant(2.000000e+00 : f32) : f32
    %3 = llvm.mlir.constant(1.000000e+01 : f32) : f32
    %4 = llvm.mlir.constant(1.000000e+00 : f32) : f32
    %5 = llvm.mlir.constant(5.000000e-01 : f32) : f32
    %6 = llvm.mlir.constant(3.1415926535897931 : f64) : f64
    %7 = llvm.mlir.constant(4.471500e-02 : f64) : f64
    %8 = llvm.fmul %arg0, %5  : f32
    %9 = llvm.fptrunc %6 : f64 to f32
    %10 = llvm.fdiv %2, %9  : f32
    %11 = llvm.call @sqrtf(%10) : (f32) -> f32
    %12 = llvm.fptrunc %7 : f64 to f32
    %13 = llvm.fmul %arg0, %arg0  : f32
    %14 = llvm.fmul %arg0, %13  : f32
    %15 = llvm.fmul %12, %14  : f32
    %16 = llvm.fadd %arg0, %15  : f32
    %17 = llvm.fmul %11, %16  : f32
    %18 = llvm.fmul %17, %17  : f32
    %19 = llvm.fmul %17, %18  : f32
    %20 = llvm.fmul %19, %3  : f32
    %21 = llvm.fmul %17, %1  : f32
    %22 = llvm.fadd %20, %21  : f32
    %23 = llvm.fmul %17, %19  : f32
    %24 = llvm.fmul %18, %0  : f32
    %25 = llvm.fadd %23, %24  : f32
    %26 = llvm.fadd %25, %1  : f32
    %27 = llvm.fdiv %22, %26  : f32
    %28 = llvm.fadd %27, %4  : f32
    %29 = llvm.fmul %8, %28  : f32
    llvm.return %29 : f32
  }
  llvm.func @_mlir_ciface_func(%arg0: f32) -> f32 attributes {llvm.emit_c_interface} {
    %0 = llvm.call @func(%arg0) : (f32) -> f32
    llvm.return %0 : f32
  }
  llvm.func @ufunc(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: i64, %arg3: i64, %arg4: i64, %arg5: !llvm.ptr, %arg6: !llvm.ptr, %arg7: i64, %arg8: i64, %arg9: i64) attributes {llvm.emit_c_interface} {
    %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
    %1 = llvm.insertvalue %arg5, %0[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    %2 = llvm.insertvalue %arg6, %1[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    %3 = llvm.insertvalue %arg7, %2[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    %4 = llvm.insertvalue %arg8, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    %5 = llvm.insertvalue %arg9, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    %6 = builtin.unrealized_conversion_cast %5 : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> to memref<?xf32>
    %7 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
    %8 = llvm.insertvalue %arg0, %7[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    %9 = llvm.insertvalue %arg1, %8[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    %10 = llvm.insertvalue %arg2, %9[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    %11 = llvm.insertvalue %arg3, %10[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    %12 = llvm.insertvalue %arg4, %11[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    %13 = builtin.unrealized_conversion_cast %12 : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> to memref<?xf32>
    %14 = llvm.mlir.constant(4.471500e-02 : f64) : f64
    %15 = llvm.mlir.constant(3.1415926535897931 : f64) : f64
    %16 = llvm.mlir.constant(5.000000e-01 : f32) : f32
    %17 = llvm.mlir.constant(1.000000e+00 : f32) : f32
    %18 = llvm.mlir.constant(1.000000e+01 : f32) : f32
    %19 = llvm.mlir.constant(2.000000e+00 : f32) : f32
    %20 = llvm.mlir.constant(1.050000e+02 : f32) : f32
    %21 = llvm.mlir.constant(4.500000e+01 : f32) : f32
    %22 = llvm.mlir.constant(0 : index) : i64
    %23 = builtin.unrealized_conversion_cast %arg3 : i64 to index
    %24 = llvm.mlir.constant(0 : index) : i64
    %25 = builtin.unrealized_conversion_cast %24 : i64 to index
    %26 = llvm.mlir.constant(1 : index) : i64
    %27 = builtin.unrealized_conversion_cast %26 : i64 to index
    %28 = llvm.mlir.constant(1 : index) : i64
    %29 = builtin.unrealized_conversion_cast %28 : i64 to index
    %30 = affine.apply #map(%23)[%25, %27]
    gpu.launch_func  @ufunc_kernel::@ufunc_kernel blocks in (%30, %29, %29) threads in (%29, %29, %29)  args(%27 : index, %25 : index, %13 : memref<?xf32>, %16 : f32, %15 : f64, %19 : f32, %14 : f64, %18 : f32, %20 : f32, %21 : f32, %17 : f32, %6 : memref<?xf32>)
    llvm.return
  }
  llvm.func @_mlir_ciface_ufunc(%arg0: !llvm.ptr, %arg1: !llvm.ptr) attributes {llvm.emit_c_interface} {
    %0 = llvm.load %arg0 : !llvm.ptr -> !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
    %1 = llvm.extractvalue %0[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    %2 = llvm.extractvalue %0[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    %3 = llvm.extractvalue %0[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    %4 = llvm.extractvalue %0[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    %5 = llvm.extractvalue %0[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    %6 = llvm.load %arg1 : !llvm.ptr -> !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
    %7 = llvm.extractvalue %6[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    %8 = llvm.extractvalue %6[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    %9 = llvm.extractvalue %6[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    %10 = llvm.extractvalue %6[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    %11 = llvm.extractvalue %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
    llvm.call @ufunc(%1, %2, %3, %4, %5, %7, %8, %9, %10, %11) : (!llvm.ptr, !llvm.ptr, i64, i64, i64, !llvm.ptr, !llvm.ptr, i64, i64, i64) -> ()
    llvm.return
  }
  gpu.module @ufunc_kernel {
    llvm.func @sqrtf(f32) -> f32 attributes {memory = #llvm.memory_effects<other = none, argMem = none, inaccessibleMem = none>, sym_visibility = "private"}
    gpu.func @ufunc_kernel(%arg0: index, %arg1: index, %arg2: memref<?xf32>, %arg3: f32, %arg4: f64, %arg5: f32, %arg6: f64, %arg7: f32, %arg8: f32, %arg9: f32, %arg10: f32, %arg11: memref<?xf32>) kernel attributes {known_block_size = array<i32: 1, 1, 1>} {
      %0 = builtin.unrealized_conversion_cast %arg11 : memref<?xf32> to !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
      %1 = builtin.unrealized_conversion_cast %arg2 : memref<?xf32> to !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
      %block_id_x = gpu.block_id  x
      %block_id_y = gpu.block_id  y
      %block_id_z = gpu.block_id  z
      %thread_id_x = gpu.thread_id  x
      %thread_id_y = gpu.thread_id  y
      %thread_id_z = gpu.thread_id  z
      %grid_dim_x = gpu.grid_dim  x
      %grid_dim_y = gpu.grid_dim  y
      %grid_dim_z = gpu.grid_dim  z
      %block_dim_x = gpu.block_dim  x
      %block_dim_y = gpu.block_dim  y
      %block_dim_z = gpu.block_dim  z
      %2 = affine.apply #map1(%block_id_x)[%arg0, %arg1]
      %3 = builtin.unrealized_conversion_cast %2 : index to i64
      %4 = llvm.extractvalue %1[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
      %5 = llvm.getelementptr %4[%3] : (!llvm.ptr, i64) -> !llvm.ptr, f32
      %6 = llvm.load %5 : !llvm.ptr -> f32
      %7 = llvm.fmul %6, %arg3  : f32
      %8 = llvm.fptrunc %arg4 : f64 to f32
      %9 = llvm.fdiv %arg5, %8  : f32
      %10 = llvm.call @sqrtf(%9) : (f32) -> f32
      %11 = llvm.fptrunc %arg6 : f64 to f32
      %12 = llvm.fmul %6, %6  : f32
      %13 = llvm.fmul %6, %12  : f32
      %14 = llvm.fmul %11, %13  : f32
      %15 = llvm.fadd %6, %14  : f32
      %16 = llvm.fmul %10, %15  : f32
      %17 = llvm.fmul %16, %16  : f32
      %18 = llvm.fmul %16, %17  : f32
      %19 = llvm.fmul %18, %arg7  : f32
      %20 = llvm.fmul %16, %arg8  : f32
      %21 = llvm.fadd %19, %20  : f32
      %22 = llvm.fmul %16, %18  : f32
      %23 = llvm.fmul %17, %arg9  : f32
      %24 = llvm.fadd %22, %23  : f32
      %25 = llvm.fadd %24, %arg8  : f32
      %26 = llvm.fdiv %21, %25  : f32
      %27 = llvm.fadd %26, %arg10  : f32
      %28 = llvm.fmul %7, %27  : f32
      %29 = llvm.extractvalue %0[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 
      %30 = llvm.getelementptr %29[%3] : (!llvm.ptr, i64) -> !llvm.ptr, f32
      llvm.store %28, %30 : f32, !llvm.ptr
      gpu.return
    }
  }
}
[metadata] ▶
time elapsed 6.50ms
timing breakdown:
  6.50ms: MLIR optimized      

Test GELU Ufunc on CUDA¶

Run the compiled CUDA ufunc on a random input and compare the result to the original NumPy implementation. If CUDA is unavailable, skip the test.

In [6]:
if __name__ == "__main__":
    if not cuda.is_available():
        print("SKIPPED. CUDA unavailable")
    else:
        relclose = lambda x, y: np.allclose(x, y, rtol=1e-6)
        input_val = np.random.random(100).astype(np.float32)
        report.display()
        run_test(
            gelu_tanh_forward,
            cuda_vectorized_gelu,
            (input_val,),
            equal=relclose,
            verbose=True,
        )
SKIPPED. CUDA unavailable

Benchmark¶

In [7]:
if __name__ == "__main__":
    if not cuda.is_available():
        print("SKIPPED. CUDA unavailable")
    else:
        input_val = np.random.random(300000).astype(np.float32)
        out = np.zeros_like(input_val)

        print("original")
        %timeit gelu_tanh_forward(input_val)
        print("superoptimized")
        %timeit cuda_vectorized_gelu(input_val, out=out)
SKIPPED. CUDA unavailable