Skip to content Skip to sidebar Skip to footer

Using Mkl_set_num_threads With Numpy

I'm trying to set the number of threads for numpy calculations with mkl_set_num_threads like this import numpy import ctypes mkl_rt = ctypes.CDLL('libmkl_rt.so') mkl_rt.mkl_set_num

Solution 1:

Ophion led me the right way. Despite the documentation, one have to transfer the parameter of mkl_set_num_thread by reference.

Now I have defined to functions, for getting and setting the threads

import numpy
import ctypes
mkl_rt = ctypes.CDLL('libmkl_rt.so')
mkl_get_max_threads = mkl_rt.mkl_get_max_threads
defmkl_set_num_threads(cores):
    mkl_rt.mkl_set_num_threads(ctypes.byref(ctypes.c_int(cores)))

mkl_set_num_threads(4)
print mkl_get_max_threads() # says 4

and they work as expected.

Edit: according to Rufflewind, the names of the C-Functions are written in capital-case, which expect parameters by value:

importctypesmkl_rt= ctypes.CDLL('libmkl_rt.so')
mkl_set_num_threads = mkl_rt.MKL_Set_Num_Threadsmkl_get_max_threads= mkl_rt.MKL_Get_Max_Threads

Solution 2:

Long story short, use MKL_Set_Num_Threads and its CamelCased friends when calling MKL from Python. The same applies to C if you don't #include <mkl.h>.


The MKL documentation seems to suggest that the correct type signature in C is:

voidmkl_set_num_threads(int nt);

Okay, let's try a minimal program then:

voidmkl_set_num_threads(int);
intmain(void) {
    mkl_set_num_threads(1);
    return0;
}

Compile it with GCC and boom, Segmentation fault again. So it seems the problem isn't restricted to Python.

Running it through a debugger (GDB) reveals:

Program received signal SIGSEGV, Segmentation fault.
0x0000… in mkl_set_num_threads_ ()
   from /…/mkl/lib/intel64/libmkl_intel_lp64.so

Wait a second, mkl_set_num_threads_?? That's the Fortran version of mkl_set_num_threads! How did we end up calling the Fortran version? (Keep in mind that Fortran's calling convention requires arguments to be passed as pointers rather than by value.)

It turns out the documentation was a complete façade. If you actually inspect the header files for the recent versions of MKL, you will find this cute little definition:

voidMKL_Set_Num_Threads(int nth);
#define mkl_set_num_threads         MKL_Set_Num_Threads

… and now everything makes sense! The correct function do call (for C code) is MKL_Set_Num_Threads, not mkl_set_num_threads. Inspecting the symbol table reveals that there are actually four different variants defined:

nm -D /…/mkl/lib/intel64/libmkl_rt.so | grep -i mkl_set_num_threads
00000000000e3060 T MKL_SET_NUM_THREADS
…
00000000000e30b0 T MKL_Set_Num_Threads
…
00000000000e3060 T mkl_set_num_threads
00000000000e3060 T mkl_set_num_threads_
…

Why did Intel put in four different variants of one function despite there being only C and Fortran variants in the documentation? I don't know for certain, but I suspect it's for compatibility with different Fortran compilers. You see, Fortran calling convention is not standardized. Different compilers will mangle the names of the functions differently:

  • some use upper case,
  • some use lower case with a trailing underscore, and
  • some use lower case with no decoration at all.

There may even be other ways that I'm not aware of. This trick allows the MKL library to be used with most Fortran compilers without any modification, the downside being that C functions need to be "mangled" to make room for the 3 variants of the Fortran calling convention.

Solution 3:

For people looking for a cross platform and packaged solution, note that we have recently released threadpoolctl, a module to limit the number of threads used in C-level threadpools called by python (OpenBLAS, OpenMP and MKL). See this answer for more info.

Solution 4:

For people looking for the complete solution, you can use a context manager:

import ctypes


classMKLThreads(object):
    _mkl_rt = None    @classmethoddef_mkl(cls):
        if cls._mkl_rt isNone:
            try:
                cls._mkl_rt = ctypes.CDLL('libmkl_rt.so')
            except OSError:
                cls._mkl_rt = ctypes.CDLL('mkl_rt.dll')
        return cls._mkl_rt

    @classmethoddefget_max_threads(cls):
        return cls._mkl().mkl_get_max_threads()

    @classmethoddefset_num_threads(cls, n):
        asserttype(n) == int
        cls._mkl().mkl_set_num_threads(ctypes.byref(ctypes.c_int(n)))

    def__init__(self, num_threads):
        self._n = num_threads
        self._saved_n = self.get_max_threads()

    def__enter__(self):
        self.set_num_threads(self._n)
        return self

    def__exit__(self, type, value, traceback):
        self.set_num_threads(self._saved_n)

Then use it like:

with MKLThreads(2):
    # do some stuff on two corespass

Or just manipulating configuration by calling following functions:

# Example
MKLThreads.set_num_threads(3)
print(MKLThreads.get_max_threads())

Code is also available in this gist.

Post a Comment for "Using Mkl_set_num_threads With Numpy"