Shared Memory Among Processes For Pre-trained Word2vec Model?

March 31, 2024 Post a Comment

I have a look-up object, specifically a pre-trained word2vec model from gensim.models.keyedvectors.Word2VecKeyedVectors. I need to do some data pre-processing and I am using multi-

Solution 1:

Yes, if:

the files were saved using Gensim's internal .save() method, and the relevant large-arrays of vectors are clearly separate .npy files
the files are loaded using Gensim's internal .load() method, with the mmap option
you avoid doing any operations which inadvertently cause each process's object to reallocate the backing array completely (breaking the mmap-sharing).

See this prior answer for an overview of the steps/concerns of a similar need.

(The concern & extra steps listed there to avoid breaking the mmap-sharing – by performing manual patch-ups of the norm properties – should no longer be necessary in Gensim 4.0.0, currently available only as a prerelease version.)

Solution 2:

Yes, here are two options:

you can use multiprocessing
or you can use Ray

Python Tutorial for Beginners

Shared Memory Among Processes For Pre-trained Word2vec Model?

Solution 1:

Solution 2:

Post a Comment for "Shared Memory Among Processes For Pre-trained Word2vec Model?"