Shared Memory Among Processes For Pre-trained Word2vec Model?
I have a look-up object, specifically a pre-trained word2vec model from gensim.models.keyedvectors.Word2VecKeyedVectors. I need to do some data pre-processing and I am using multi-
Solution 1:
Yes, if:
- the files were saved using Gensim's internal
.save()
method, and the relevant large-arrays of vectors are clearly separate.npy
files - the files are loaded using Gensim's internal
.load()
method, with themmap
option - you avoid doing any operations which inadvertently cause each process's object to reallocate the backing array completely (breaking the mmap-sharing).
See this prior answer for an overview of the steps/concerns of a similar need.
(The concern & extra steps listed there to avoid breaking the mmap-sharing – by performing manual patch-ups of the norm
properties – should no longer be necessary in Gensim 4.0.0, currently available only as a prerelease version.)
Solution 2:
Yes, here are two options:
- you can use multiprocessing
- or you can use Ray
Post a Comment for "Shared Memory Among Processes For Pre-trained Word2vec Model?"