Skip to content Skip to sidebar Skip to footer

K-Means Clustering - Output Clusters Contains Same Number Of Elements But In Different Order [ Python ]

I followed this tutorial to perform K - Means clustering for a list containing individual words. This is a cricket based project so I picked K = 3 so that I can differentiate the t

Solution 1:

Your "clusterlists" is only appended once at the end of the code. Try to correct the indentation of "clusterlists", it should be OK.

Also, the indentation in the original post seems off, too. Check the indentation after copy and paste.


Solution 2:

A short time ago I tested some code to do clustering of text. It's somewhat unorthodox to calculate distances between text, but you can do it, if you really want to.

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
from sklearn.metrics import adjusted_rand_score

documents = ["This little kitty came to play when I was eating at a restaurant.",
             "Merley has the best squooshy kitten belly.",
             "Google Translate app is incredible.",
             "If you open 100 tab in google you get a smiley face.",
             "Best cat photo I've ever taken.",
             "Climbing ninja cat.",
             "Impressed with google map feedback.",
             "Key promoter extension for Google Chrome."]

vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(documents)

true_k = 8
model = KMeans(n_clusters=true_k, init='k-means++', max_iter=1000, n_init=1)
model.fit(X)

print("Top terms per cluster:")
order_centroids = model.cluster_centers_.argsort()[:, ::-1]
terms = vectorizer.get_feature_names()
for i in range(true_k):
    print("Cluster %d:" % i),
    for ind in order_centroids[i, :10]:
        print(' %s' % terms[ind]),
    print

print("\n")
print("Prediction")

Y = vectorizer.transform(["chrome browser to open."])
prediction = model.predict(Y)
print(prediction)

Y = vectorizer.transform(["My cat is hungry."])
prediction = model.predict(Y)
print(prediction)

Just modify that to suit your specific needs.


Post a Comment for "K-Means Clustering - Output Clusters Contains Same Number Of Elements But In Different Order [ Python ]"