Python Pandas Memoryerror
I have those packages installed: python: 2.7.3.final.0 python-bits: 64 OS: Linux machine: x86_64 processor: x86_64 byteorder: little pandas: 0.13.1 This is the dataframe info: <
Solution 1:
I can also reproduce it on 0.13.1, but the issue does not occur in 0.12 or in 0.14 (released yesterday), so it seems a bug in 0.13. So, maybe try to upgrade your pandas version, as the vectorized way is much faster as the apply (5s vs >1min on my machine), and using less peak memory (200Mb vs 980Mb, with %memit) on 0.14
Using your sample data repeated 50000 times (leading to a df of 450k rows), and using the apply_id
function of @jsalonen:
In [23]: pd.__version__
Out[23]: '0.14.0'
In [24]: %timeit df_train['Store'].astype(str) +'_' + df_train['Dept'].astype(str)+'_'+ df_train['Date_Str'].astype(str)
1 loops, best of 3: 5.42 s per loop
In [25]: %timeit df_train.apply(apply_id, 1)
1 loops, best of 3: 1min 11s per loop
In [26]: %load_ext memory_profiler
In [27]: %memit df_train['Store'].astype(str) +'_' + df_train['Dept'].astype(str)+'_'+ df_train['Date_Str'].astype(str)
peak memory: 201.75 MiB, increment: 0.01 MiB
In [28]: %memit df_train.apply(apply_id, 1)
peak memory: 982.56 MiB, increment: 780.79 MiB
Post a Comment for "Python Pandas Memoryerror"