Skip to content Skip to sidebar Skip to footer

Sklearn Joblib Load Function Io Error From Aws S3

I am trying to load a pkl dump of my classifier from sklearn-learn. The joblib dump does a much better compression than the cPickle dump for my object so I would like to stick with

Solution 1:

joblib.load() expects a name of the file present on filesystem.

Signature: joblib.load(filename, mmap_mode=None)
Parameters
-----------
filename:string
    The name of the file from which to load the object

Moreover, making all your resources public might not be a good idea for other assets, even you don't mind pickled model being accessible to the world.

It is rather simple to copy object from S3 to local filesystem of your worker first:

from boto.s3.connectionimport S3Connection
from sklearn.externalsimport joblib
import os

s3_connection = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
s3_bucket = s3_connection.get_bucket(keys.AWS_BUCKET_NAME)
local_file = '/tmp/classifier.pkl'
s3_bucket.get_key(aws_app_assets + 'classifier.pkl').get_contents_to_filename(local_file)
clf = joblib.load(local_file)
os.remove(local_file)

Hope this helped.

P.S. you can use this approach to pickle the entire sklearn pipeline. This includes also feature imputation. Just beware of version conflicts of libraries between training and predicting.

Post a Comment for "Sklearn Joblib Load Function Io Error From Aws S3"