Creating C Extensions for Python with Numpy and CUDA (Part 1)

July 3, 2019

Part 2 is available here, covering Numpy integration and CUDA kernels. A full Github repository containing all this code can be found here.

The Python C-API lets you write functions in C and call them like normal Python functions. This is super useful for computationally heavy code, and it can even be used to call CUDA kernels from Python. There don’t seem to be a good tutorial about how to do this, so we’re going to walk through the process of defining a couple C functions with support for CUDA and calling them natively from Python.

Writing a simple hello world program with the Python C-API is easy. Here it is:

#include <stdio.h>
#include <Python.h>

static PyObject* helloworld(PyObject* self, PyObject* args) {
    printf("Hello World\n");
    return Py_None;

static PyMethodDef methods[] = {
    {"helloworld", helloworld, METH_NOARGS, "A Simple Hello World Function"}, // (function name, function, arguments, doc_string)
    {NULL, NULL, 0, NULL}

static struct PyModuleDef myModule = {
    PyModuleDef_HEAD_INIT, "myModule", // name of the module
    "myModule", -1, methods

PyMODINIT_FUNC PyInit_myModule(void) {
    return PyModule_Create(&myModule);

Copy this into a myModule.c file, and then create a script using the convenient Python distutils package to build and install it.

from distutils.core import setup, Extension

setup(name = 'customModule', version = '1.0',  \
   ext_modules = [Extension('myModule', ['myModule.c'])])

With these two files, you can simply run python build && python install to install the module. Then

>>> import customModule
>>> customModule.helloworld()
Hello World

will print Hello World as expected. That was easy. The distutils package handles most of the installation for you. The PyMethodDef object defines the methods supported by the customModule module. We have to define the function name, the name of the C function being called, the kind of arguments expected (METH_NOARGS = no arguments in this case), and a doc string. The helloworld function itself takes the “self” object and a list of args (again, empty in this case) and returns the None PyObject, since it shouldn’t return anything.

Something more complicated

That wasn’t hard. Now we can try doing something a little more complicated, say writing a simple C function to compute the first n fibonacci numbers in C. We can even add memoization! Thanks to Elliot Forbes for the basic fibonacci idea!

static unsigned long long * memo = NULL; // static cache used to keep track of memoized values

unsigned long long cfib(int n) {
    unsigned long long value;

    if (n < 2 || memo[n] != 0) 
        return memo[n];
    else {
        value = cfib(n-1) + cfib(n-2);
        memo[n] = value;
        return value;

// Our Python binding to our C function
// This will take one and only one non-keyword argument
static PyObject* fib(PyObject* self, PyObject* args) {
    int n;
    if (!PyArg_ParseTuple(args, "i", &n))
        return NULL;
    if (n < 2) {
        return Py_BuildValue("i", n);

    memo = (unsigned long long *) calloc(n + 1, sizeof(unsigned long long));  // memoization, initialized to 0
    if (memo == NULL) {
        PyErr_SetString(PyExc_RuntimeError, "Unable to dynamically allocate memory for memoization.");
        return NULL;

    memo[0] = 0; // set initial conditions
    memo[1] = 1;
    // return our computed fib number
    PyObject* value = PyLong_FromUnsignedLongLong(cfib(n));
    return Py_BuildValue("N", value);

static PyMethodDef methods[] = {
    {"helloworld", helloworld, METH_NOARGS, "Prints Hello World"},
    {"fib", fib, METH_VARARGS, "Computes the nth Fibonacci number"}, // METH_VARARGS allows for arbitrary positional arguments

A few things need explanations here. First of all, the memoized fibonacci sequence is cool. When the fib function is called, it zero-initializes an array of length n + 1 which we use to store previously computed fibonacci values. This means we don’t duplicate work by computing cfib(n) an exponential number f times. This actually becomes a linear algorithm.

The PyArg_ParseTuple function also deserves some explanation. When the METH_VARARGS flag is set in the methods list, arguments are passed to the C-API as a tuple, which can be unpacked by passing the appropriate flag to the ParseTuple method (documentation is here). Here we pass the “i” flag to tell the function that we expect an integer valued argument. PyErr_SetString specifies the type of error that occurred (a runtime memory error in this case) and the explanation string. This gets passed back to Python and printed.

Cool. All of this can now be built and run. You can play around with writing your own functions, adding new methods, and more. In Part 2, we’ll talk about adding support for Numpy and CUDA functions!

Update: To read part 2, covering Numpy integration and CUDA kernels, click here. All this code is available on a Github page here with an installation script