How To Get Top 3 Or Top N Predictions Using Sklearn's Sgdclassifier

June 07, 2023 Post a Comment

from sklearn.feature_extraction.text import TfidfVectorizer import numpy as np from sklearn import linear_model arr=['dogs cats lions','apple pineapple orange','water fire earth ai

Solution 1:

There is no built-in function, but what is wrong with

probs = clf.predict_proba(test)
best_n = np.argsort(probs, axis=1)[-n:]

?

As suggested by one of the comment, should change [-n:] to [:,-n:]

probs = clf.predict_proba(test)
best_n = np.argsort(probs, axis=1)[:,-n:]

Solution 2:

I know this has been answered...but I can add a bit more...

#both preds and truths are same shape m by n (m is number of predictions and n is number of classes)deftop_n_accuracy(preds, truths, n):
    best_n = np.argsort(preds, axis=1)[:,-n:]
    ts = np.argmax(truths, axis=1)
    successes = 0for i inrange(ts.shape[0]):
      if ts[i] in best_n[i,:]:
        successes += 1returnfloat(successes)/ts.shape[0]

It's quick and dirty but I find it useful. One can add their own error checking, etc..

Solution 3:

Hopefully, Andreas will help with this. predict_probs is not available when loss='hinge'. To get top n class when loss='hinge' do:

calibrated_clf = CalibratedClassifierCV(clfSDG, cv=3, method='sigmoid')
model = calibrated_clf.fit(train.data, train.label)

probs = model.predict_proba(test_data)
sorted( zip( calibrated_clf.classes_, probs[0] ), key=lambda x:x[1] )[-n:]

Not sure if clfSDG.predict and calibrated_clf.predict will always predict the same class.

Baca Juga

Solution 4:

argsort gives results in ascending order, if you want to save yourself with unusual loops or confusion you can use a simple trick.

probs = clf.predict_proba(test)
best_n = np.argsort(-probs, axis=1)[:, :n]

Negating the probabilities will turn smallest to largest and hence you can take top-n results in descending order.

Solution 5:

As @FredFoo described in How do I get indices of N maximum values in a NumPy array? a faster method would be to use argpartition.

Newer NumPy versions (1.8 and up) have a function called argpartition for this. To get the indices of the four largest elements, do

>>>a = np.array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])>>>a array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])>>>ind = np.argpartition(a, -4)[-4:]>>>ind array([1, 5, 8, 0])>>>a[ind] array([4, 9, 6, 9])

Unlike argsort, this function runs in linear time in the worst case, but the returned indices are not sorted, as can be seen from the result of evaluating a[ind]. If you need that too, sort them afterwards:

>>> ind[np.argsort(a[ind])] array([1, 8, 5, 0])

To get the top-k elements in sorted order in this way takes O(n + k log k) time.

Python Tutorial for Beginners