How To Save Big (not Huge) Dictonaries In Python?

December 31, 2022 Post a Comment

My dictionary will consist of several thousand keys which each key having a 1000x1000 numpy array as value. I don't need the file to be human readable. Small size and fast loading

Solution 1:

If you have a dictionary where the keys are strings and the values are arrays, like this:

>>> import numpy
>>> arrs = {'a': numpy.array([1,2]),
            'b': numpy.array([3,4]),
            'c': numpy.array([5,6])}

You can use numpy.savez to save them, by key, to a compressed file:

>>> numpy.savez('file.npz', **arrs)

To load it back:

>>> npzfile = numpy.load('file.npz')
>>> npzfile
<numpy.lib.npyio.NpzFile object at 0x1fa7610>
>>> npzfile['a']
array([1, 2])
>>> npzfile['b']
array([3, 4])
>>> npzfile['c']
array([5, 6])

Solution 2:

The filesystem itself is often an underappreciated data structure. You could have a dictionary that is a map from your keys to filenames, and then each file has the 1000x1000 array in it. Pickling the dictionary would be quick and easy, and then the data files can just contain raw data (which numpy can easily load).

Solution 3:

How about numpy.savez? It can save multiple numpy array and they are binary so it should be faster than pickle.

Solution 4:

Google's Protobuf specification is designed to be extremely efficient on overhead. I'm not sure how fast at (de)serializing it is, but being Google, I imagine it's not shabby.

Solution 5:

You can use PyTables (http://www.pytables.org/moin) , and save your data in HDF5 format.

Python Tutorial for Beginners