Python: Nested For Loop Is Extremely Slow - Reading Rle Compressed 3d Data
Solution 1:
Speed up loops
At first don't use np.arange to create an iterator (this will create an array on which you iterate over). Use range (Python3) or xrange (Python2) instead. This should increase the performane by a few percent, but isn't your real bottleneck here.
Matlab has a just in time compiler to perform relativly good in loops, CPython doesn't have this by default. But there is a just in time compiler called numba http://numba.pydata.org/ . In the documentation you will find supported functions that can be compiled to native machine code. When using numba I would also recommend to write things in loops instead of vectorised code, because this is easier to handle for the compiler.
I have modified your code a bit.
defdecompress_RLE(labelsRleCompressed,vox_size):
res=np.empty(vox_size[0]*vox_size[1]*vox_size[2],np.uint32)
ii=0for i inrange(0,labelsRleCompressed.size,2):
value=labelsRleCompressed[i]
rep=labelsRleCompressed[i+1]
for j inrange(0,rep):
res[ii]=value
ii=ii+1
res=res.reshape((vox_size[0],vox_size[1],vox_size[2]))
return res
Create data for benchmarking
vox_size=np.array((300,300,300),dtype=int32)
#create some data
labelsRleCompressed=np.random.randint(0, 500,
size=vox_size[0]*vox_size[1]*vox_size[2]/2, dtype=np.uint32)
labelsRleCompressed[1::2]=4
Simply calling the function with the generated data results in a runtime of 7.5 seconds, which is a rather poor performance.
Now let's use numba.
import numba
nb_decompress_RLE = numba.jit("uint32[:,:,:](uint32[:],int32[:])",nopython=True)(decompress_RLE) #stick to the datatypes written in the decorator
Calling the compiled nb_decompress_RLE with the test data results in a runtime of 0.0617 seconds. A nice speed up by a factor of 119! Simple copying an array with np.copy is only 3 times faster.
Post a Comment for "Python: Nested For Loop Is Extremely Slow - Reading Rle Compressed 3d Data"